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1.0 INTRODUCTION: 


This document contains the description and specifications for a digital 
avionics design and reliability analyzer. It is the result of the study done 
by Martin Marietta concerning the use of emulation for investigating 
reliability and fault-tolerance issues for proposed highly reliable commercial 
digital avionics systems. The study was contracted by the NASA Langley 
Research Center because of the coming technology in commercial aircraft, which 
largely precludes traditional approaches to certification. 

Airframes for the 1990 *s are designed to be much more fuel efficient than 
current designs, but this fuel efficiency is bought at a price of less 
stability. To maintain safe flight, very reliable avionics computers are 
envisioned to allow the necessary quick reaction times and continuous 
monitoring of flight parameters. 

The computers are designed to break down so rarely (less than once in a 
human lifetime) that conventional bench and field tests cannot certify their 
reliability. The Federal Aviation Administration is in the process of 
adopting new certification procedures that emphasize mathematical models and 
simulations of the system over actual tests. To put the effort in 
perspective, the computers will be predicted to break down less often than the 
wings are expected to fall off planes in flight. The new avionics computers 
must be significantly more reliable than today’s avionics computers. They 
could function unattended, despite hardware or software failures for at least 
a 10-hour flight. This super-reliability will be gained through redundant 
hardware and software. Faults that occurred will be counteracted 
automatically by hardware and/or software algorithms. As these highly fuel 
efficient aircraft would fly 100 percent of the time in critically stable 
conditions, control of the aircraft must be maintained concurrently with the . 
fault detection and correction process. Further, any faults occurring during 
the recognition and correction of a previous fault must be handled as well. 

The hardware/ software configuration described in this document is referred 
to as the Digital Avionics Design and Reliability Analyzer. Its basic 
function is to provide for the simulation and emulation of the various 
fault- tolerant digital avionic computer designs that are developed. It has 
been established that hardware emulation at the gate-level will be utilized. 
The primary benefit of emulation to reliability analysis is the fact that it 
provides the capability to model a system at a very detailed level. This 
means that rather than basing reliability analyses on manufacturer’s supplied 
data, or on expected probability distributions of failures of parts to 
determine the response of a system, detailed models of a system may now be 
employed on an experimental basis and system responses to faults observed 
rather than predicted. Emulation allows the direct insertion of faults into 
the system, rather than waiting for actual hardware failures to occur. This 
allows for controlled and accelerated testing of system reaction to hardware 
failures. 

This report, has two primary sections. Section 3 is a description of the 
functions of the system. This is intended to provide a perspective of the 
system for the specification which follows in Section 4. Section 4 contains 
the more definitive hardware and software requirements necessary to achieve 
the goals and functions given in Section 3. 
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There are two Appendices and one Attachment. Appendix A is the trade 
study which leads to the decision to specify a two machine system, including 
an emulation computer connected to a general purpose computer. Appendix B is 
an evaluation of potential computers to serve as the emulation computer. 
Attachment I is the previously delivered Interim Technical Report. This 
report details the feasibility study and describes in some detail the NASA 
Langley gate level algorithm which provided the basis for most of the 
performance figures required in the specification. 


2,0 APPLICABLE DOCUMENTS 


2.1 Reference Documents 

1) Feasibility Study Report, Digital Avionics Design and Reliability 
Analyzer, November 1979. 

2) Interim Technical Report, Digital Avionics Design and Reliability 
Analyzer, February 1980. 

3) System Design Progress Report, Digital Avionics Design and 
Reliability Analyzer, July 1980. 

2.2 Standards 

1) Electronics Industries Association Standard RS-449 

2) Federal Standard 1031 

3) Electronics Industries Association Standard RS-232-C 

4) American National Standards Institute X. 3. 9-1966 

5) Federal Information Processing Standards Publication 1 

6) Federal Information Processing Standards Publication 2 

7) Federal Information Processing Standards Publication 3-1 

8) Federal Information Processing Standards Publication 25 

9) Federal Information Processing Standards Publication 16 

10) Federal Information Processing Standards Publication 17 

11) Federal Information Processing Standards Publication 18 

2.3 Other 

To be furnished by the Government 
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3.0 SYSTEM FUNCTIONAL/ OPERATIONAL DESCRIPTION 

3.1 Introduction 

This section is intended to provide an overall description of what the 
system (including the analyst) must do without regard to the elements; 
hardware, software or manual procedures, which allow it to be done. The 
emphasis in this section is on the logical functions required for the digital 
avionics design and reliability analyzer. To express these functions, we use 
structured analysis tools and notation.^ The notation which will be used 
throughout this section is based on three elements: data flow diagrams, 

mini-specifications, and the data dictionary. 


3.1.1. Data Flow Diagrams 


Data Flow Diagrams (DFD) are used to present the system pictorially thus 
reducing the amount of narrative needed. A DFD is a network representation of 
a system. The system may be automated, -manual, or mixed. The DFD portrays 
the system in terms of its component functional pieces with all interfaces 
among the components indicated. A DFD does not represent the flow of control 
or the order of processing. Numbers used on the diagrams are for 
identification purposes only. Data Flow Diagrams are made up of four basic 
elements: 


1 ) 


Data flows 
packets of 


represented by uamed vectors, are pipelines 
information of known composition flow. 


hh-roiiah wh i r*h 


2) Processes, represented by bubbles, are transformations of incoming 

data flow(s) into outgoing data flow(s). Each process bubble needs a 
descriptive name. 


3) Data stores, represented by two straight horizontal lines, are 

temporary repositories of data and may consist of tapes, discs, card 
sets, index files, data bases, or even someone's memory. 


4) Data sources and sinks, represented by boxes, are persons, 

organizations, or other entities lying outside the context of a 
system, that are net originators or receivers of system data. A 
source box exists only to provide commentary about the system's 
connection to the outside world. 


Data Flow Diagrams are expressed in levels. The first level, called the 
Context Diagram is labeled Diagram 0 and portrays an overall picture of the 
system with subsystems shown. These subsystems are labeled 1 through N. The 
subsystems are broken down in separate DFDs and further described. The 
components of the first subsystem are labeled l.l, 1.2, 1.3, etc. When a 
subsystem has been decomposed to as simple a form as necessary, it is called a 
functional primitive. 


1. Tom DeMarco, Structured Analysis and System Specification . New York; 
Yourdon, 1978. 
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There are many advantages to using leveled Data Flow Diagrams. They allow 
a top-down approach to analysis. By reading the top few levels one can get 
the big picture, or one can begin with the abstract and go to the detailed and 
narrow in on particular areas of interest. Each page is a complete 
presentation of the area of work allocated to it. All diagrams can be 
restricted to 8 1/2 X II inch paper. 

3.1.2 Mini-Specification 

The second part of the system functional definition consists of the 
Mini-Specifications which are concise descriptions of the bottom-level bubbles 
(functional primitives). Each Mini-Spec describes rules governing 
transformation of data flows arriving at the associated primitive into data 
flows leaving it. 

3.1.3 Data Dictionary 

To augment the Data Flow Diagram, there is an entity called the Data 
Dictionary. This contains rigorous definitions of all Data Flow Diagram 
elements such as data flows, components of data flows, files, and processes. 
These definitions relate all data elements through sequence, selection, or 
iteration. 

3.1.4 The structured analysis information in this section is augmented as 
necessary by textual material to highlight important points. 

3.2 Usage Phases 

The digital avionics design and reliability analyzer is intended to 
support three primary uses: 

1) Reliability analyses 

2) Failure effects analyses 

3) Conventional performance analyses 

Regardless of their differences, each of these has several characteristics 
in common with the others. Primary among these commonalities is the fact that 
each involves data gathering which is facilitated by the technology of 
emulation. As shown in Figure 3-1, there are 3 basic phases of each use. 

These phases are: 



1) 

Test design 


2) 

Test execution 


3) 

Data reduction/ analysis 
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These phases are shown in a different form in the Context Data Flow 
diagram given in Figure 3-2. In this diagram, the results of each phase are 
shown. Test design encompasses processes 1 and 3, test execution is process 2 
and data reduction is process 4. Model building, process 1, is an inherent 
part of test design and so is not considered a separate phase in itself. 

3.3. Test Design Phase 

The modeling part of the Test Design Phase is shown graphically in Figure 
3-3, and described in the process descriptions. One key concept which needs 
highlighting is the division of a system into functional blocks. This 
partitioning is necessary due to the time constraints of emulating at the gate 
level. Based on the results of the feasibility study (see Attachment 1), it 
is impossible to emulate the gate structure of the entire system under test. 
Thus the mixed mode concept, where the system is simulated at a functional 
level until a fault is inserted at which time the functional simulation of the 
affected block is replaced with a gate level emulation of that block. 

Following Figure 3-3, Figure 3-4, and Figure 3-5 are mini-specifications 
describing each process shown in these figures. 

One other concept not shown explicitly concerns the redundant computations 
which occur in a fault tolerant computing system. In a model, there is no 
necessity of actually performing redundant operations until one of the 
redundant paths errs (due to the introduction of a fault). This concept 
arises also during the Test Execution. 

3.4 Test Execution Phase 

The Test Execution Phase is shown in Figure 3-5. As noted in 3.3, the 
actual execution uses a combination of functional level simulation and gate 
level emulation of the machine under test. 
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PROCESS: 1.1. Subdivide System 
;PROCESS SPECIFICATION 

IF model - informat ion CONTAINS "gate- 1 eve 1 -model -needed” THEN 
CREATE funct ional -block USING system-block-diagrams 
DEFINE system-boundry USING funct ional -bl ock 
DEFINE 1 nterna 1 - i nterf aces USING ( funct i ona 1 -bl ock AND 

system-bl ock -diagrams) 

ELSE 

DEFINE i nternal - i nterf aces USING system-block-diagrams 
ENDIF 

DEFINE external - interfaces USING system-block-diagrams 
ENDPROCESS 


PROCESS: 1.2, Produce Gate Level Model 
. ;PROCESS SPECIFICATION 

IF model -Informat ion CONTAINS "gate- 1 eve 1 -model -needed” THEN 
FOR EACH functional -block IN system-boundr i es DO 

CREATE block-gate-model USING system- 1 ogi c-d i agrams 
TRANSLATE block-gate-model TO bl ock-gate- tabl e 
ENDFOR 

ASSEMBLE gate- 1 evel -mode 1 FROM bl ock-gate- tabl es 
ENDIF 

ENDPROCESS 


PROCESS: 1.3. Produce Functional Model 
-.PROCESS SPECIFICATION 

FOR EACH funct ional -block IN system-boundr i es DO 

CREATE ( bl ock- funct i onal -model AND i nterf ace-behav i or -mode 1 ) USING 

( system- funct i onal -descr ipt ion AND i nterna 1 - i nterf aces ) 
TRANSLATE (bl ock- funct i onal -model AND i nter f ace-behav i or-model ) TO 

(functional - level -simulation-code AND functional - level -symbol -table) 

ENDFOR 

ASSEMBLE funct i onal - 1 evel -model FROM ( funct ional - 1 evel -s imu 1 at ion-code AND 

funct Ional - 1 evel -symbol - table ) 

CREATE code-generation-description USING system-functional -descript ion 
ENDPROCESS 


ORIGINAB P^GE IS 
OF POOR QUALITY 
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PROCESS: 1.4, Define Model Specifics 
;PROCESS SPECIFICATION 

IF model -type -needed CONTAINS "gate- 1 eve 1 -mode 1 -needed" THEN 
SET model - informat ion TO "gate- 1 evel -model -needed" + 

" mode 1 - subd 1 v i s i on - needed " 

ELSE 

SET model - i nformat ion TO "monol i thic-model -needed" 

ENDIF 

ENDPROCESS 


i 
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PROCESS: 1.5, Produce Model Interconnection 
;PROCESS SPECIFICATION 
FOR EACH system-boundry DO 

DEFINE boundry- informat ion USING (system-block-diagrams AND 

system- 1 ogic*di agrams ) 

ENDFOR 

ENDPROCESS 


PROCESS: 1.6, Produce Loadab 1 e Software 
;PROCESS SPECIFICATION 
FOR EACH test-software DO 

TRANSLATE test-software TO (machine-object-code AND symbol -table) 
USING code-genera t ion-description 

ENDFOR 

ASSEMBLE loadable-software FROM (machi ne-object-code AND symbol -tabi e) 
ENDPROCESS 


PROCESS: 1.7, Produce Environmental Model 
;PROCESS SPECIFICATION 

CREATE env 1 ronmental -model -descr ipt ion USING 

system-environmental -description 

TRANSLATE environmental -model -descr ipt ion TO executabl e-envi ronmental -model 
ENDPROCESS 
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PROCESS: 3.1. Determine Data to be Collected 
;PROCESS SPECIFICATION 

IF type-resul t -needed = reliability-number THEN 
DETERMINE conf idence- level -desired 

CALCULATE number -of - samp 1 es -necessary FROM conf i dence- 1 evel -des i red 
DETERMINE type -da t a -necessary /* for statistical reduction */ 

DETERMINE ( type-of -f a i 1 ure-des i red AND desired-failure-distribution) 
ELSE 

IF type- resu 1 t - needed = f ai lure-ef f ects-anal ys i s THEN 
DETERMINE number -of -samp! es- necessary 

DETERMINE type-data-necessary FROM f a i 1 ure-mode-of - 1 nterest 
FOR EACH number-of -sampl es-necessary DO 
DETERMINE type-of -fai lure-desired 
ENDFOR 
ELSE 

IF type-result-needed ® performance-characteristic THEN 

DETERMINE type-data-necessary /♦ for specific characteristics ♦/ 
END IF 
END IF 
ENDIF 

ENDPROCESS 


PROCESS: 3.2, Define Model Characteristics 
;PROCESS SPECIFICATION 

IF type-data-necessary IN data-desired CONTAINS "gate-performance" THEN 
SET model - type- needed TO "f unct ional -model -needed" + 

"gate- level -model -needed" 

ELSE 

SET model - type-needed TO "f unct ional -model -needed" 

ENDIF 

ENDPROCESS 


PROCESS: 3.3, Determine Instrumentation Points 
;PROCESS SPECIFICATION 

DETERMINE instrumentation-points IN (executabl e-envi ronmental -model AND 

funct ional - 1 evel -model ) USING 
type-data-necessary IN data-desired 

IF type-data-necessary IN data-desired CONTAINS "gate-performance" THEN 
DETERMINE instrumentation-points IN gate- level -model 
ENDIF 

ENDPROCESS 
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PROCESS: 3.4, Define Data Recording Directives 

;PROCESS SPECIFICATION 

FOR EACH Instrumentation-point DO 

DEFINE data-recordi ng-di rect i ves USING data-desired 
ENDFOR 

ENDPROCESS 


PROCESS: 3,5, Define Test Sequence 
;PROCESS SPECIFICATION 

IF type-resul t -needed = reliability-number THEN 
FOR EACH number-of -samples-necessary DO 

DETERMINE faul ts-to-be- inserted FROM (desi red-fai lure-di str ibut ion AND 

sped f 1c- system-port ion- of - interest ) 

ENDFOR 

ELSE 

IF type-result-needed * f a i lure-ef fects-anal ys i s THEN 

DETERMINE faul ts-to-be- inserted FROM ( type-of -f a i 1 ure-des i red AND 

specif ic-system-portlon-of- Interest ) 

ENDIF 

ENDIF 

DETERMINE environmental -model -d1 rect Ives /♦ for desired test ♦/ 

ENDPROCESS 
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executable>environmental-model 
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Figure 3-5 Execute Test 



PROCESS: 2.1 


Load Software 


•.PROCESS SPECIFICATION 


ASSEMBLE 


configured-system FROM (executabl e-envi ronmental -model + 

( funct ional - level -model + ( gate- 1 evel -mode i + 
(boundry- inf ormat ion + loadable-software)))) 


ENDPROCESS 


PROCESS: 2,2, Instrument System 
;PROCESS SPECIFICATION 

FOR EACH data-recording-di rect 1 ve DO 

IF data-recording-poi nt = "symbol” THEN 
FIND insertion-point IN symbol -table 
ELSE 

IF data-recording-point = " target -memory- 1 ocat ion" THEN 

FIND insertion-point USING funct iona 1 - 1 evel -symbol -tab! e 
END IF 
END IF 

CHANGE machine-object-code TO "trap" 

OUTPUT instrumented-system 
ENDFOR 

ENDPROCESS 


PROCESS: 2.3. Run Test 
;PROCESS SPECIFICATION 

IF test-directive CONTAINS " 1 oad-saved-state" THEN 
RETRIEVE' saved-state FROM saved-stat a-data-base 
END IF 

FOR EACH test-directive DO 
/♦ execute test directive 
ENDFOR 

ENDPROCESS 
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3.5 Data Reduction/ Analysis Phase 

The Data Reduct ion/ Analysis Phase is shown in Figure 3-6. For failure 
effects analysis or conventional performance analysis, this phase consists 
mostly of grouping and analyzing collected data to determine actions, trends, 
etc. For reliability analysis, this phase consists of data reduction and 
statistical analysis, followed by the use of the results in a reliability 
model of the system. 

3.6 Data Dictionary 

The Data Dictionary follows Figure 3-6 and defines all terms used in the 
data flow diagrams as well as the mini-specs. For the Data Dictionary, the 

following symbols indicate: 


1) 

= 

is composed of 

2) 

( ) 

optional item 

3) 


a 1 te r na t ive i terns 

4) 

n £ ^ m 

iterations of with optional lower (n) and upper (m) limits 

5 

+ 

and 
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PROCESS: 4.1, Determine Analysis to Perform 
: PROCESS SPECIFICATION 

IF type- resu 1 t -needed CONTAINS reliability-number THEN 
OUTPUT rel iabi 1 i ty-resul ts -needed 
ELSE 

IF type- resul t- needed CONTAINS f a i 1 ure-ef f ects-ana 1 ys i s THEN 
OUTPUT f a i lure-ef f ects-needed 
ELSE 

IF type-resu 1 t -needed CONTAINS perf ormance-character i st ic THEN 
OUTPUT perf ormance-character i s t i cs-des i red 
END IF 
END IF 
END IF 

ENDPROCESS 


PROCESS: 4.2, Reduce Execution Data 
;PROCESS SPECIFICATION 

ASSEMBLE execution-data USING perf ormance-character i st i cs-des i red 
EXTRACT perf ormance-measure 

ENDPROCESS 


PROCESS: 4.3, Analyze Effects of Inserted Faults 
;PROCESS SPECIFICATION 

FOR EACH faul ts-to-be- inserted IN execution-data DO 

DETERMINE ( effect -of -faul t AND propogat i on-of -f aul t ) USING execution-data 
OUTPUT fai lure-ef fects-resul t 
ENDFOR 

ENDPROCESS 


PROCESS: 4.4. Reduce and Group Execution Data 
•.PROCESS SPECIFICATION 

ASSEMBLE execution-data USING sped f Ic-system-port i on-of - 1 nterest 
CHANGE execution-data TO composite-execution-data 

ENDPROCESS 
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PROCESS: 4.5, Run Reliability Analysis 
;PROCESS SPECIFICATION 

/• Run reliability model using composite-execution-data */ 
CALCULATE predicted-ral i ab i 1 i ty-number USING 


ENDPR0CES5 


i 
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DATA DICTIONARY 


actuator-description = SELF_DEFINING /♦ description of actuators ♦/ 
analysis-result = [performance-measure I reliability-number | 
fai lure-ef f ects-resul t] 

block-functional -model = SELF_OEFINING /* description of the behavior of 

each functional block */ 

block-gate-model = SELF_DEFINING /♦ machine readable version of system logic 

diagrams broken into functional 
blocks */ 

block-gate-tabl e = Jgate-info? 
block-gate-tables = ibl ock-gate-table3 

block-number = SELF__DEF INING /* id of the block this gate is in */ 

boundry- informat ion = SELF_DEF INING /* list of inputs and outputs to system */ 

code-generat ion-descr ipt ion = op-code- 1 nformat i on + instruction-formats 

composite-execution-data = iexecut ion-data] 

conf idence- 1 evel = number 

confidence- level -desired = percentage 

configured-system » executable-envi ronmental -model + gate- level -model + 
funct ional - 1 evel -model + boundry- i nformat ion + 
loadable- soft ware 

current-gate-value = ["0" I ** 1 •* I "undefined" I "tri -state"] 
data-desired * [number-of -samples-necessary + type-data-necessary + 
conf idence- 1 evel -des 1 red + type-of -fai lure-desired + 
tdesi red-fai lure-di str ibut ion3 I type-data-necessary ] + 
specif i c- system-port ion- of - interest 

data-recordi ng-di rect i ve ' * data-recordi ng-point + data- to-be-gathered + 

[ t ime- 1 nterval 1 time i system-significant-event] + 
output-device + output-format 
data-recordi ng-di rect ives = Jdata-recording-direct i ve> 
data-recordi ng-point = ["symbol" I "target memory location"] 
data-recordi ng-poi nts = Cdata-recording-point> 

data-to-be-gathered =* SELF_DEF INING /♦ this item left unspecified since it 

" could be wide range of possibilities, 

ranging from modeled items to actual 
items in the modeling machine ♦/ 
desired-fai lure-di str ibut ion = probability-distribution 

des i red-performance- i nformat 1 on * SELF^DEFINING /* this is the performance 

characteristic which we need to ascertain. 

Since the possibilities are numerous, this 
definition is not constrained. */ 

durat ion-of -f aul t = number 

ef fect-of -faul t = [stuck-at-fault I transient-fault] 

environmental -model -descript ion = fsensor-descr ipt ion3 + -tactuator-descr i pt i on] 

loutput-device-descr ipt ionl + 

^interconnect ion-descr ipt ionj 
environmental -model -directive * ini tial -value + range-limits 
environmental -model -directives = lenvi ronmental -model -directive! 
env 1 ronmental -model -performance = time + sensor-state 
environmental -simulation-code = machine-object -code 
env i ronmental -symbol - tabi e » symbol -table 

event- Identifier = ["sensor out of bounds" I "machine parameter out of bounds" \ 

system-signi f leant -event ] 
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DATA DICTIONARY (CONT) 


executable-envi ronmental 'model = env i ronmenta 1 - s i mul at i on-code + 

environmental -symbol -table 

execution-data = lirun-1d + [rel iabil ity-sample-data I performance-sample-data 

I fai lure-ef fects-samp1e-data]i 

execut i on- t ime = number 

external -input = SELF_DEFINING /* this is an input from the system from the 

outside world. No restrictions are placed on 
its form or contents */ 

external - i nterf aces = ^externa 1 - i nputj + {externa 1 -outputj 

external -output = SELF_OEF INING /* this is an output to the outside world. 

No restrictions are placed on its form or 
content. ♦/ 

fai lure-ef fects-analysis = "failure effects needed” + 

specif ic-system-portion-of- interest + type-of -fai lure-des i red 
fa i lure-ef fects-needed = speci f 1 c-system-port ion-of - i nterest + 
type-of-fai lure-desired 

fai lure-ef fects-result = ffaul ts-to-be- Inserted + propogatlon-of-faultf 
fallure-effects-sample-data = 1 llnl t lal -state-data + 1 £faul ts-to-be- Inserted + 

Igate-behavior-datalij 

fa i 1 ure-mode-of - i nterest = f ai 1 ure-ef f ects-anal ysi s + faul ts- to-be- i nserted 
faul t- Insert ion = "fault inserted" + faul ts-to-be- inserted 

faul ts-to-be- i nserted = 1 ocat ion-of -faul t + time-of -faul t + ef f ect-of -f aul t 
+ durat ion-of -faul t 

funct ional -block = subsystem + internal - interf aces 
functional -blocks * {f unct i onal -bl ock} 

functional -element-performance » SELF_DEFINING /* performance measures of some 

portion of the system. This 
item is so variable, it is not 
specified in detail */ 

funct 1 onal - level -model = funct i onal - 1 evel -s imu 1 at i on-code + 
functional - level -symbol -table 
functional -level -simulation-code = machine-object-code 
funct ional - level -symbol -tab! e = symbol -table 
gate-behavior - last-gate-value + current-gate-val ue 
gate-behavior-data = gate- id + machine-cycle + gate-behavior 
gate- id = block-number + gate-nujnber 
gate- info = gate-state- info + £gate-output^ 
gate- 1 nterconnect i on- tabl e = tgate-id + £gate-id5} 

gate- level -model = J block-gate- tabl e} + {gate- i nterconnect ion- tabl e3 + 
gate- symbo 1 - tabl e 

gate-number = SELF_DEFINING /* id of this gate within Its block */ 
gate-output = SELF_DEFINING /♦ pointer to one of this gates outputs */ 
gate-perf ormance = £gate-behav i or-dataj 
gate-state- inf o = gate-type + gate-value 

gate-type°= ["AND" l^"OR" ^ ^ ""^NAND" I "NOR" I "INVERT" I "XOR" I "FLIP-FLOP"] 

gate-value * [ "O" I "1" I "undefined* "tri-state"] 

ini tial -State-data = time + {external - input! + {external -outputj + 

{sensor-state5 + {i nternal -statej 
i ni t lal -value = SELF_DEFINING 

insertion-point = machine-object-code-location 
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insertion-points = li nsert i on-po i nti 

i nstruct ion-f ormats = SELF_DEFINING /* information concerning addressing modes. 

bit patterns, etc as needed by the 
code generator */ 

instrumentation-point - mach i ne-object -code- 1 ocat i on 
instrumentation-points = ^ 1 nstrumentat ion-point| 

i ns trumented-system = imachi ne-object-codeJ + Idata-recording-pointsS + 

J.i nstrumentat i on -point si 

interconnection-description = SELF_DEF INING /* description of how sensors. 

actuators, output devices are connected 
to the test system */ 

i nterf ace-behav i or -mode 1 = SELF^^DEFINING /♦ list of interconnections between 

functional blocks */ 

i nternal - interfaces = SELF_DEFINING /* connect Iona between blocks */ 
i nternal -state = imachine-statei 
interrupt = SELF^DEFINING 

1 ast-gate-value = ["0" I "I'* I "undefined” I "tri -state"] 
loadable-software = |mach i ne-obj ect-code + symbol - tabi ej 
1 ocat lon-of -fault = gate- id 
lower-limit = SELF_DEF INING 

machine-cycle = SELF_OEFINING /* id of the current machine cycle ♦/ 
machi ne-ob j ect -code = SELF_DEFINING 
machine-object -code- locat ion = number 

machine-state = SELF_DEF INING /♦ this is the state of the computer, including 

registers, memory, mode and any other 
parameters necessary to describe the current 
status of the machine itself */ 

rnodej - i nf ormat ion = [ "gate- 1 evel -model -needed" I "model -subdiv 1 s ion- needed" 

I "monol i thic-model -needed" ] 

model -type -needed = "funct ional -model -needed" + ( "gate- level -model -needed" ) 

number * SELF^DEFINING 

number-of -sampl Gs-necessary * number 

op-code- inf ormat ion = SELF_DEFINING /♦ information concerning op codes as 

needed by the code generator ♦/ 
output -devi ce = ["disk" I "tape" i "console" I "line printer"] 
output-device-descr ipt ion = SELF_DEFINING /* description of any other system 

output devices */ 

output-format = ["decimal" I "octal" I "hexidecimal" I "binary" I 
"unformatted" ] 
percentage = number 

perf ormance-character i s t i c = "performance information needed" + 
des i red-per formance- inf ormat ion 

perf ormance-character i st i cs-des i red = spec i f 1 c- system -port ion-of - i nterest 
performance-measure = 5ELF_DEF INING /* this will depend on the type of measure 

desired, this is highly variable so 
no enumeration is given here */ 

performance-sample-data = 1 ii ni t ial -state-data + J.5 igni f icant-event-data|j 
predicted-rel i abi 1 1 ty-number = number + conf i dence- 1 evel 
probabi 1 i ty-di str ibut ion = SELF_DEFINING 
propogation-of -faul t = £gate-behav1or-data^ 
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data dictionary (cont) 


range-limits = upper-limit + lower-limit 

rel iabi 1 i ty-number = "reliability number needed" + conf i dence- 1 evel -des i red 
rel i abi 1 i ty-resul ts-needed = conf i dence- 1 evel -des i red 
reliability-sample-data = 1 1 sampl e-number + i n i t i al -state-data + 

i s ign i f i cant -event -datali 

run- id = number 
sampl e- number = number 

saved-state = [configured-system I i nstrumented-system] + time + 

^external - i nterf acesJ + Isensor-stateJ + ii nternal -states 
saved-state-data-base = i saved-statej 

sensor-descr ipt ion = SELF_DEFINING /* description of what sensor is and how it 

behaves. May be text */ 

sensor-state = SELF_DEF INING /* this is the current state of the sensor as 

defined by some parameters such as orientation, 
or by its output values */ 

significant-event-data * time + I external - i nputj + iexternal -output 5 + 

tsensor-statel + ^internal -statej + event- Ident i f ier 
sped f ic-system-port ion-of - interest = f unct i onal -bl ock 
stuck-at-faul t = [ stuck-at -one-f aul t I stuck-at-zero-faul t I 
stuck -at- i ndeterminate-f aul t ] 
stuck-at- indetermi nate-f aul t = SELF_DEFINING 
stuck-at-one-f aul t = SELF_DEFINING 
stuck-at -zero-fau 1 t = SELF^DEF INING 

subsystem - SELF_DEF INING h any reasonable chunk of the system which can be 

isolated as an identifiable piece ♦/ 
symbol -table = insertion-points + instrumentation-points 
/♦ + a bunch of other stuff ♦/ 

system-block-diagram = SELF_DEFINING /* block diagram of the system of interest 

showing major components and their 
interfaces */ 

system-block-diagrams = ^system-block-diagram§ 
system-boundr ies = i system-boundryl 

system-boundry = £ f unct i onal -block! + £i nternal - i nterf acesi + 
externa 1 - 1 nter faces 

system-environmental -description = SELF_OEFINING /* description of the behavior 

of the system external 
environment including all 
input and output */ 

system-functional -description = SELF_DEF INING /* description of the functional 

level behavior of the system, 
including instruction fetch and 
decode of the computer(s) */ 

system- logic-diagram = SELF^DEFINING 

system- logic-diagrams = isystem- 1 ogic-diagraml 

system-model * executabl e-envi ronmental -model + funct i onal - level -model + 

( gate- 1 evel -model ) + boundry- i nf ormat i on 
system-significant-event = [interrupt I trap \ faul t- insert ion] 
test-conduct-directive * i n i t i al -state-data + execut ion-t ime + sample-number 
test-directive = £f aul ts- to-be- i nsertedl + ienv i ronmental -model -di rect i ve*i 
+ £test-conduct -di rect i vel 
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DATA DICTIONARY (CONT) 





test “d1 rect 1 ves * £ test-dl recti vej 

test-software = SELF^DEF INING /* source software for the system under test */ 
test-system-definition system-envi ronmental -descr ipt ion + 

( system- 1 ogi c-diagrams ) + system-f unct ional -descr ipt ion + 
test-sof tv/are -i system-block-diagrams 

time = number 
t ime- interval = number 
t ime-of -faul t = number 
transient-fault = SELF_DEF INING 

trap = SELF_DEFINING this is the occurrance of a system trap inserted for 

the purposes of recording data or some such reason */ 
trap- i nsert i on = SELF_DEFINING 

type-data-necessary = (gate-performance) + ( funct ional -el ement-performance) + 
( env i ronmenta 1 -mode 1 -perf ormance ) 
type-of-failure-desired * [ stuck-at-f aul t t trans ient -f aul t ] 

type-result-needed = [performance-character i st ic I f a i 1 ure-mode-of - i nterest 
I rel iabi 1 i ty- number] + speci f ic-system-portion-of- interest 
upper-limit = SELF_OEF INING 
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4.0 System Specification 

4.1 General System Configuration 

This specification describes the requirements for the digital avionics design 
and reliability analyzer. This facility consists of two major hardware 
itemsas shown in Figure 4-1, a general purpose computer providing user support 
and interface, simulation, and numerous other pieces of software; and an 
emulation computer to provide either gate level emulation or general 
instruction level hardware emulation. These two computers are interfaced for 
synchronization and data transfer. The software for the facility is shown in 
diagram 4.2. Of the five major components only a small part of the general 
purpose support software, the model building software and the test execution 
software run on the emulation computer. The major portion of the software 
runs on the general purpose computer. 



Figure 4-1 System Hardware Components 
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Figure 4-2 System Software Components 







4.2 Hardware Configuration 


The aystem shall consist of two cooperating machines connected via an 
interface. These machines shall be: 

1) A general purpose computer providing user interface; software support 
such as editors, assemblers, compilers, simulation support; and 
analysis support. 

2) An emulation computer supporting emulations ranging from gate level 
to instruction level. 

4.2.1 General Purpose Machine 

4. 2. 1.1 Central Processor 


4. 2. 1.1.1 The system shall have a real-time clock (interval timer) for use by 
the operating system 


4. 2. 1.1. 2 Machine hardware instructions shall include integer, single and 
double precision floating point, packed-decimal, character string 
manipulation, bit shifting and rotating, and logical instructions. 


4. 2. 1.1. 3 

J - J * 1 ^.. 

uj.vi.ue uj 


Hardware fault detection shall be provided, i.e., 
eXjjuiient cver^xOw, and expcriciit underflow* 


detection of 


4*2*1«1.4 The system shall detect a power failure or fluctuation and have the 
capacity to provide for an orderly system shutdown. Upon re-establishment of 
stable power, automatic restart of the system must be provided for. This 
requirement may be met by battery back-up to maintain proposed MOS (metal 
oxide semiconductor) memory allowing for operator notification and 
intervention. The system must be maintained for a long enough period to 
permit any necessary steps to be accomplished to allow for restart of the 
system and user programs. 

4. 2.1. 1,5 The architecture of the system shall be based on a computer with 
effective addressing, register size, and interger arithmetics of at least 
sixteen (16) bits. 


4. 2. 1.1. 6 The general purpose computer shall have the speed and power 
necessary to execute the environmental model and the functional level model 
specified in 4. 3. 2.1 in the normal operating mode, cooperating with each 
other, at a slow down of not more than 3000 times real time. 

4 • 2 . 1 , 2 Memory 

4.2.1. 2.1 The memory requirements stated are in terms of bytes. A byte is 
defined as the alphanumeric character oriented unit of measure composed of a 
least eight (8) bits. Manufacturers whose internal architecture is such that 
they normally operate with less than 8 bit bytes must adjust their bytes or 
words of memory proposed to reflect the 8 bit requirement. Memory single word 
size must be at least sixteen (16) bits available to user programs. 
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4.2.1»2.2 The initial configuration must be a minimum of one-half (1/2) 
million bytes of main memory. The system architecture shall not preclude a 
single user program from utilizing the full complement of main memory beyond 
the residency requirement of the operating system and related software. 

Both hardware and software shall support two (2) million bytes of physical 
memory for expansion purposes. 

4. 2. 1.2. 3 Areas or regions of memory shall be memory protected to facilitate 
the protection of the operating system and individual user programs. This 
requirement may be met by any combination of hardware and/or software features. 

4. 2. 1.2. 4 Single bit fault correction and multiple bit fault detection shall 
be provided. 

All detected memory faults shall be logged by the system. This log shall be 
accessible by either a vendor, customer engineer, and/or government personnel. 

4.2.1. 2.5 The rationale for the one-half (1/2) million bytes of main memory 
is as follows: 

1) Traditionally, interactive graphics systems tend to be complex and to 
require significant amounts of memory to operate effectively. The 
interpretive graphic subsystem is only a small portion of the total 
system and will undoubtedly have to operate concurrently periodically 
with other tasks. Even if operating by itself, it is quite 
conceivable that once in a production mode that multiple digitizing 
stations will be required. 

2) As detailed system design has not been completed, it is difficult to 
predict with an accuracy the ultimate memory requirement of the test 
execution software. The following items will need to be memory 
resident for the test execution and in total will be significant in 
terms of memory required: 

a. Fault tolerant target machine object code. 

b. Compiled hardware description code for the target machine. 

c. Actuators/sensors values and associated bound limits. 

d. Fault data being introduced. 

The fault tolerant target machines will be complex in terms of having 
redundant hardware components and significant associated control /management 
software. 

3) With anticipated run times of test execution software to be in terms 
of hours or days, throughput can not be significantly degraded due to 
excessive page thrashing and/or overlay roll-in and roll-out. The 
requirement for physical memory to be expandable to two (2) million 
bytes is to keep execution times within reason as the fault tolerant 
systems under study become more complex. 
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4.2.1.2.6 Memory allocation shall be dynamically allocated with the ability 
to support at least four (4) interactive devices concurrently at installation 
time and expandable to eight (8). A minimum of two (2) batch jobs must run 
concurrently with the interactive users. The environment is to be that of 
true multiprogramming, i.e., a fixed partition foreground /background 
environment specifically shall not be permitted. 

4. 2. 1.3 Disk Storage 

4. 2.1. 3.1 Five-hundred (500) million 8 bit bytes of removable and 
interchangeable formatted disk storage shall be available to the users of the 
system. Disk storage required for system software is in addition to this 
requirement. This disk space for the system shall be expandable by a factor 
of two (2). 

4.2.1. 3.2 Average access time including latency and seek time, shall be 55 
milliseconds or faster. The transfer rate shall not be less than 800,000 
8-bit bytes per second. 

4.2.1.3.3 The vendor shall provide an initial complete set of recording 
media, as well as a complete backup set, both containing no more than 0.01% 
unnacceptable sectors per unit. 

4. 2. 1.3. 4 A minimum of two (2) physical drives are required. 

4. 2. 1.3.5 The five-hundred (500) million 8-bit bytes of removable and 
interchangeable formatted disk storage for the user is considered justified 
for the following reasons: 

1) Disk I/O spooling area for local print output. It is anticipated 
that the various report products shall be maintained on disk for 
several work days during their review, the rationale being to save 
computer run time in the event that additional copies are required 
for further study and distribution. 

2) Provision for multiple files of gate level logic diagrams with 
associated legend. These files represent the various portions of the 
target fault-tolerant computer system under evaluation. Different 
portions of the target fault-tolerant computer system will be at 
different stages of the capturing and editing of gate level logic 
diagrams via the interactive graphics subsystem. 

3) Program source code library. 

4) Program object code library. 

5) Multiple gate queueing structure tables in emulator computer 
compatable format. 

6) Multiple fault data files. 
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7) Multiple files of the initial conditions and bound limits of the 
avionic actuators and sensors of the fault tolerant systems under 
study. 

8) Multiple hardware configuration descriptions defining various fault 
tolerant system options. 

9) Library of procedure files and parameter files. 

10) Data files associated with mathmatical and statistical analysis 
routines. 

11) As the mechanical/electronic nature of disk drives require frequent 
maintenance, the requirement of two physical spindles was specified 
to allow some work to continue when one drive in unavailable. 
Admittedly, the capability will require careful organization of the 
disk files. 

12) Further, utilizing a large capacity disk drive allows achievement of 
economy of scale. For example, a calculation revealed that, for one 
vendor, going from a medium to a large capacity drive resulted in a 
162% increase in capacity for a 76% increase in cost. 

13) Disk space is required for the recording of data during the execution 
of test software (4.3.4). The approach taken of recording 
information only when out of limits conditions occur (4. 3.4.4. 3) is a 
compromise over what the run data recording requirements could be. 

The calculation provided here is an example of what the run storage 
requirements would be if recording of data were to be done for each 
target machine simulated/emulated cycle. 

Assumptions : 

6.000 gates 

15% gate state changes 

IjUs target machine cycle time 

1.000 samples per run. 

Calculations : 

900 (15% of 6,000) gate state changes per cycle 

900.000. 000 gate changes per sample 

9000.000. 000.000 gate changes per run 

With 3 gate changes recorded per 

32 bit word, 300,000,000,000 words of disk required. 

1.200.000. 000.000 bytes of disk required 
or if on magnetic tape 

With 4,000 character tape blocks 

300,000,000 blocks at 3 inches of tape each is 900,000,000 
inches of tape 

900.000. 000 - 28,800 = 31,250 2400 foot reels 



- ^ 
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4.2.1.4 I/O Devices 

4.2.1.4.1 Tape Drivers 

4.2;1.4.1.1 Two (2) read/write nine track 1600 CPI phase encoded tape drives 
of not less than 75 IPS read/write speed or less than 120,000 bytes per second 
peak transfer rate shall be provided. 

4. 2.1. 4. 1.2 The tape units shall provide for read-af ter-write check feature. 

4. 2. 1.4. 1.3 The tape units shall handle up to 2400 foot reel size. 

4. 2. 1.4. 1.4 The tape units shall be of the vacuum chamber type. Mechanical 
feed arms are not permitted. 

4. 2. 1.4. 2 Line Printer 

4. 2. 1.4. 2.1 One (1) impact type printer shall be provided. The printer shall 
have no fewer than 132 print positions. The ASCII character set of 95 
characters shall be employed. The proposed printer shall be able to line 
space at 6 and 8 lines per inch, vertically. The printer must provide 
standard horizontal spacing of ten characters to the inch. 

4. 2. 1.4. 2. 2 The throughput requirement is minimally 600 lines per minute when 
printing full 132 character lines consisting of the 95 printable character set. 

4.2.1. 4.2.3 The system shall be upgradable to a configuration of two (2) 
printers meeting these specifications. 

4. 2. 1.4 .3 Operator Console 

4. 2. 1.4. 3.1 The system operating console shall provide for hard copy output. 
The console must be of rugged construction capable of withstanding heavy use, 
i.e., continuous use during operating hours. This requirement would not 
preclude a printing unit operating as a slave to a CRT operator console. 

4.2. 1.4.4 Telecommunications Hardware 

Telecommunication hardware shall be provided to handle data exchange and its 
associated line disciplines between local terminals and the host computing 
system. Attached terminals will be used for time sharing, inquiry /response, 
local graphics and local plotting. The system shall be able to handle half 
and full duplex lines concurrently. Circuit disciplines in general shall 
include at the minimum start/stop half duplex and full duplex asynchronous 
transmission. 

4. 2. 1.4. 4.1 All telecommunications hardware supplied by the vendor shall 
conform to the Electronic Industries Association Standard RS 449. The 
government has adopted RS-449 as Federal Standard 1031, which became mandatory 
for all procurements by federal agencies starting June 1, 1980. EIA Standards 
RS-449, RS-422, and RS-423 are intended to gradually replace RS-232-C. 
Telecommunication hardware which conforms to the new standards shall be 
provided. The vendor's proposal must state how existing terminals which 
conform to RS-232-C will be accomodated. 
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4. 2. 1.4. 4. 2 The telecommunications hardware/software shall support the 
Teletype (TTY) start -stop asynchronous communications. The 
emulators/simulators are intended for use with various ADP vendors, so the 
proposed emulation/ simulation shall not be specifically designed for any 
particular vendor. 

4. 2. 1.4. 4. 3 The initial four (4) communications ports (See A. 2. 2.5) will be 
utilized for some combination of alphanumeric CRT(s), graphic display, 
digitizer board, and electrostatic plotter. The proposed system must be 
upgradable to eight (8) communication ports. 

4. 2. 1.4. 4. 4 One (1) interactive user CRT terminal is to be provided by the 
vendor. The unit will be locally attached to the CPU operating at the speed 
of 1200 BPS or faster. The physical connection will not exceed the industry 
standard of fifty feet. This unit will be utilized by the software program 
specified under 4.3. The following are minimum specifications to be met: 

1) 80 character line width 

2) 24 vertical lines 

3) Fill-in-the-form capability with the form stored in the background 
and variable information entered in the foreground. Once the form 
has been loaded to the CRT memory, it will be utilized for a series 
of transactions without need for retransmission from the computer. 
Only the variable information is to be transmitted to the computer, 
during the data entry process. 

4) Normal and reverse video 

5) Double intensity 

6) Blinking 

7) Underlining 

4. 2. 1.4. 5 General Purpose Computer Emulation Computer Interface 

4. 2. 1.4. 5.1 The contractor shall provide any necessary hardware to 
interconnect the general purpose computer to the emulation computer. (Related 
software is specified under 4.3). 

4. 2. 1.4. 5. 2 Data transfers between the two computers will consist of the 
following: 


4. 2. 1.4. 5. 2.1 For reliability analyses data gathering, the transfers will 
include: 
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1) At the start of the run, the general purpose computer will load the 
emulation computer control memory with the gate level emulation 
algorithm and the emulation computer primary memory with the gate 
tables for the specific system portion of interest. 

2) During the run at each fault insertion time, the gate tables within 
the emulation computer primary memory will be updated to reflect the 
inserted fault by the general purpose computer. 

3) During the run, for each machine cycle, the inputs to the block being 
emulated at the gate level will be transferred from the general 
purpose machine to the emulation machine and the outputs of the block 
will be transferred back from the emulation machine to the general 
purpose machine. The quantity of data transferred depends on the 
degree of interconnection between the emulated block and the rest of 
the system. 

4. 2. 1.4. 5. 2. 2 For failure effects analysis, the data transfers will be the 
same as specified in reliability analysis data gathering. In addition, at the 
end of each cycle, the new state of each changed gate may potentially be 
transferred from the emulation computer back to the general purpose computer. 

4. 2. 1.4. 5. 2. 3 For standard emulation purposes, data transfers will be as 
follows: 

1) At the start of the run, the emulation machine control and primary 
memory will be loaded with the appropriate software by the general 
purpose machine. 



2) During the run, input and output data from the environmental 
simulation to the emulated machine and back will be transferred at 
appropriate times. 

3) Additional data concerning the state of items in the emulated machine 
may potentially be transferred back to the general purpose machine 
for performance evaluation purposes. 

4.2. 1.4. 5. 2. 4 The speed of the computer-computer interface for data transfer 
shall be sufficiently fast so that the predominant amount of time in the 
reliability analysis data gathering experiments will be time for 
simulation/emulation of the system rather than for data transfer. 

4. 2. 1.5 Interactive Graphics Subsystem 

4. 2. 1.3.1 The contractor shall provide the necessary hardware to capture and 
validate gate level logic diagrams. (Related software is specified under 4.3) 

4. 2. 1.5. 2 The graphics workstation is to be made up of the following 
components : 
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1) Digitizing Board with cross-hair cursor. The digitizing surface must 
be large enough to handle logic diagrams up to standard size E (34 X 
44"). The gantry style digitizer is preferred, but 

mandatory. The logic symbols are not to be digitized in detail. The 
symbol type is to be selected from a menu and the symbol position is 
' to be recorded via the cursor. With this approach, a digitizer board 

with minimal accuracy, resolution, and repeatability may be 
utilized. A resolution of 100 points per inch is adequate. The 
working surface shall have both tilt and height control. 

2) Graphics CRT with alphanumeric keyboard. The minimum screen size 
shall be 19". This requirement may be met by a single raster scan 
type graphic CRT with the capability of a reference drawing being 
flLhed onto the screen from which a zoom-in area may be selected. 

The requirement may also be met via two storage tube type graphic 
CRT’s. A reference drawing would be displayed on one CRT while 
zoom-in areas are displayed on the second CRT. 

3) Electrostatic plotter with a roll paper width of 36". The unit shall 
have a resolution of 100 points per inch. The electrostatic plotter 
will be used primarily for quick turnaround images for validating 
plots against the original input document, i.e., gate level logic 
diagrams. 

4.2.2 Emulation Computer 

4. 2. 2.1 CPU Architecture - The emulation computer shall be user 
microprogrammable. The microcode shall provide control over primitive 
functions within the machine (e.g. connection of registers to busses, ALU 
operations, etc) and shall provide the capability for parallel operations 
within a microword. 

4. 2. 2. 1.1 Microcode containing the NASA Langley gate-level algorithm or 
similar algorithm must be programmed into the emulator. Due to the stringent 
speed requirements for processing such an algorithm, the microcode must 
perform multiple operations in parallel. 

4. 2. 2. 1.2 Each gate being processed is described by a gate information word 
of eight or more bits. This word is also the address to which control is 
transferred in micro store, thus micro store must be sufficient to handle all 
locations addressed. 


4. 2. 2. 2 Memory 

4. 2. 2. 2.1 Microprogram Memory - Sufficient microprogram memory shall be 
provided to accomodate a table-driven gate level emulation algoritta. 
Requirements of the algorithm are detailed in paragraph 4. 3. 4. 7. As a 
minimum, at least IK words of microprogram memory shall be provided. 
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4.2*2.2*2 Primary Memory - Sufficient primary memory shall be provided to 
contain the gate level tables required by the emulation algorithm. These 
tables shall accomodate at least 5000 gates with an average gate fan-out of 
2. As a minimum, at least 32K words of primary memory shall be provided. 

4. 2. 2. 3 The selection of the micro code to be executed shall be via a 
"vector" type mechanism. That is, some combination of bits in a word 
containing gate status shall provide the address of the microinstruction to be 
executed. Such a mechanism precludes the necessity of testing individual bits 
to determine the action to take for a particular gate. 

4. 2. 2. 4 The emulation computer shall also be useful for instruction level 
emulation of digital devices. The characteristics of the machine shall be 
such that it will accomodate such emulation. 

4. 2. 2. 5 The emulation computer shall be interfaced to the general purpose 
computer for data transfer and for software level synchronization of 
cooperating, parallel simulations and emulations in the two machines. Data 
transfers expected are defined in 4. 2. 1.4. 5. 

4. 2. 2. 6 The emulation computer shall have the speed and power necessary to 
execute the gate level emulation, in the normal operating node, for 6000 gates 
for 0.1 seconds of emulated time in 5 minutes or less of real time. The cycle 
time of the emulated system for this timing figure shall be 1 microsecond, the 
average gate fanout shall be 2, and in any one cycle, 5% of the gates will 
change value, on the average. 

4.3 Software Configuration 

The software consists of five major pieces. These pieces are: 

1) General purpose support software 

2) Model building software 

3) Test generation software 

4) Test execution software 

5) Analysis software 

The software, with exception of some of the test execution software, some of 
the general purpose support software, and some of the model building software 
shall run on the general purpose machine. 

4.3.1 General Purpose Support Software 

4.3.1 .1 General Purpose Machine Operating System 


v::i 
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The system shall feature a single, fully implemented operating 
system that integrates all the hardware and software that emprise the 
system. The operating system shall be generally available in the market 
place. More specifically, all features and capabilities shall have b^n 
publicly and formally announced and operational prior to the offer submission 

deadline. 


4. 3. 1.1. 2 It is anticipated that the primary mode of operation will be a 
single operator performing a single task. Examples would be a single graphic 
station capturing a logic diagram or a simulation/emulation job running alone 
in the system. However, the architecture of the system shall not preclude the 
concurrent processing of a simulation/emulation run with the digitization 
process. Nor should the architecture preclude the addition of a second 
graphic work station in the future to operate concurrently with the original 

graphic work station. 

Allocation of resources to tasks shall be performed as automatically as 
possible. All the software items specified throughout this document shall be 
able to operate concurrently with any and all others, except for restrictions 
such as momentary unavailability of an equipment resource. 

4. 3. 1.1. 3 The operating system shall provide a dynamic environment. That is, 
memory management shall be done in such a manner that all concurrent runni^ 
jobs in total may require more memory than what is physically available, rae 
addition of more physical memory would improve the system's performance, ^is 
capability shall be provided without requiring the ‘ programming staff to define 
overlays. 

4. 3. 1.1. 4 The operating system shall have the ability to produce and retain 
in mass storage for later processing, resource utilization data pertinent to 
each task performed. The resource data produced shall include most or the 
following by user account /charges 


1) 

Number of lines printed 


2) 

Central processing unit 

usage 

3) 

Input/output usage 


4) 

Remote terminal connect 

time or traffic statistics 

5) 

Actual memory used 


6) 

Amount of mass storage 

used 


Simplified measuring units such as the aggregate of the items above, shall be 
reversible to the individual component level. 



4.3«1«1«5 The operating system shall operate the following basic job origin 
tasks concurrently: 

1) Interactive 

2) Local batch 

4.3.1.1.6 The operating system shall provide for I/O spooling. Spooling of 
local print output shall be provided for. Spooling is defined here as 
providing a temporary file that will act as a buffer for spontaneous input or 
output of data and thereby reduce impacts to executing programs waiting for 
I/O services. Direct, i.e., non-spooled, I/O shall also be available for 
time-critical transmissions . 

4. 3. 1.1. 7 Terminal users of the system shall be able to communicate with the 
system operator via terminals and vice versa via the operator control 
console. This is required because terminal users may or may not reside in the 
same room as the system. 

4. 3. 1.1. 8 Interactive batch job submittal from time-sharing devices shall be 
provided. The user shall be pemitted to save files on either magnetic tape 
or mass storage disk. 

4.3.1. 1.9 A terminal user, shall be able to determine the status of a batch 
job from a time— sharing terminal. 

4.3.1.1.10 The system response time to an interactive user system command 
shall not exceed an average of two (2) seconds. The absolute maximum response 
time shall not exceed thirty (30) seconds. 

4.3.1.1.11 The system must provide for a job control language that allows the 
user to override system defaults and parameters pertinent to job management, 
job scheduling and data management. This provision shall provide control over 
job priorities; job termination options; programmatic steps within a job 
stream; job dispatching and execution, etc. 

4.3.1.1.12 Security and system authorization. The system must limit access 
to any and all installation resources, including files and data contained 
therein. This facility will only allow processes to those users that are 
pre-defined as authorized for access. Read and write permits must be features 
within the data authorization scheme. 

4.3.1.1.13 The operating system shall be considered to be state-of-the art. 
That is, the operating system shall have been designed, developed and 
implemented to support an environment of concurrent interactive and batch 
jobs. The system being specified in this document is to be utilized in the 
evaluation and testing of fault tolerant airborne avionic computers of the 
future. When consideration is given to this fact and the fact that the 
environment is one of new technological development, it is prudent and 
reasonable that only the best available resources and tools should be made 
available for the project. 
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4.3.1. 2 User Oriented System Software Components • 

4. 3. 1.2.1 A file editor shall be provided with the following minimiim 
capabilities : 

1) With the exception of binary object files and files written by 
FORTRAN as unformatted, be able to manipulate any and all files used 
by the system. 

2) Must be available interactively and optionally be available through 
the batch mode. 

3) Must contain, as a minimum, the following, or equivalent capabilities: 

1. Replace String 

2. Change line 

3. Delete 

4. Print/List 

5. Search (forward and backward) 

6. Insert 

7. Add 

4) Must provide the user with the view that his whole file is 
immediately available to him, that is, he must not have to 
specifically fill and empty the current edit buffer. 

4. 3. 1.3 General Purpose Programming Languages 

4. 3. 1.3.1 A FORTRAN compiler that minimally meets the ANSI X. 3. 9-1966 
specifications shall be provided. The delivered compiler must be stable and 
thoroughly debugged. 

4. 3. 1.3. 2 An ASSEMBLER or hardware level compiler shall be provided which 
possesses features not available in the high level programming lanpaps 
required under 4. 3. 1.3.1 and 4. 3. 2. 2. Bit and character level manipulation, 
privileged instructions, register referencing, and branching based on hardware 
conditions shall be provided. 
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4.3. 2.1 The relationships of the various models shall be as follows: 

4»3«2.1.1 The environmental simulation shall execute in the general purpose 
machine and shall simulate the effects of the systems, sensors and activators 
which interface to the digital avionics computer(s). This would include such 
items as attitude and rate sensors, attitude control activators, etc. This 
model shall produce the identical effect as if the avionics computer(s) were 
connected to actual devices in a real system. 

4. 3. 2. 1.2 Functional level machine simulation shall provide a model of the 
behavior of the avionics computer(s). This simulation shall be at the 
instruction level of the computer such that actual software may be executed by 
the simulation with results identical to the real avionics computer(s). This 
model shall interact with the environmental model, reacting to the inputs 
provided by that model and producing the appropriate outputs to that model. 
This model shall also interact with the gate level emulation(s) which are 
active by providing the appropriate inputs and receiving the outputs of the 
gate level emulation(s) . 

4.3.2.1.3 The gate level emulation shall provide a model of the behavior of a 
portion of the digital avionics computer(s) at the gate level. It shall 
interact with the functional level simulation by receiving inputs from that 
simulation and by providing appropriate outputs to the simulation. This model 
will correctly propagate inserted faults to its outputs. 

4. 3.2. 1.4 The instruction level machine emulation, is intended to interact 
only with an environmental simulation. This model shall provide the 
capability to emulate, on the emulation machine, a complement of digital 
hardware at the instruction level. This is intended for gathering data 
concerning performance and not for failure effects or feliability evaluation. 

4.3. 2. 2 Model Description Translators 

4. 3. 2. 2.1 Environmental Model Translator 

4.3. 2. 2. 1.1 A translator shall be provided which will translate an 
environmental description into executable simulation code on the general 
purpose machine. The translator shall accomodate descriptions of outputs, 
inputs, limits, etc., for sensors, activators and items external to the 
avionics computer(s). The translator shall accomodate tagging items of 
interest for later checking on limits during the simulation execution. 

4.3.2.2.1.1 The environmental model execution code shall interface to and 
provide input and output for the functional level computer simulator. 

4. 3. 2. 2. 2 Functional Model Translator 

4. 3. 2. 2. 2.1 A translator shall be provided to translate a functional or 
instruction level description of one or more digital avionics computers into 
executable simulation code on the general purpose machine. 


4. 3. 2. 2. 2. 2 The functional model translator shall use a hardware description 
language which allows expression of the structure and behavior of digital 
systems* 

4. 3. 2. 2. 2. 2.1 The hardware description language shall provide for expression 
of timing and synchronization, both between internal elements and between the 
system being described and the external environment. 

4 . 3. 2. 2. 2. 2. 2 The hardware description language shall provide for description 
of the interface between the system being described and the external 
environment in terms of inputs and outputs. This description shall provide 
the tie to the executable code for the environmental model so the two models 
will work together. 

4. 3. 2. 2. 2. 2. 3 The hardware description language shall allow the description 
of the system in terms of independent functional blocks and the interfaces 
between those blocks. The translator shall produce code which allows the 
replacement of the code for a functional block with seething else which will 
provide the same inputs and accept the same outputs without modifying the 
model itself. This shall provide the link between the functional simulation 
model and the gate level emulation model. 

4 . 3. 2. 2. 3 Gate Level Model Translator 

4. 3. 2. 2. 3.1 The gate level model translator shall translate system logic . 
diagrams to the gate level tables needed by the gate level emulation algorithm. 

4. 3. 2. 2. 3. 2 The gate level model translator system shall include all 
necessary interactive graphics software to operate on the general purpose 
computer for capturing and validating logic diagrams. Standard logic symbols 


shall be 

used for: 

1) 

Inverter 

2) 

AND gate 

3) 

OR gate 

4) 

XOR (exclusive OR) gate 

5). 

NAND (not AND) gate 

6) 

NOR (not OR) gate 

7) 

RS flip-flops 

8) 

T flip-flops 

9) 

D flip-flops 

10) 

JK flip-flops 

11) 

other logic devices. 
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The software capability to capture legend and associate the legend with each 
gate or device symbol shall be provided. The software shall provide for 
multiple logic diagram sheets for a single function to be emulated. That is, 
a single logic diagram up to 34" X 44" in size will not always represent an 
entire function to be emulated as a complete unit. Yet each separate sheet 
must be stored on disk as a subunit for output on the electrostatic plotter 
for the validation process. Off diagram linkages to other sheets must be 
provided for. A single "E" size drawing 34" X 44" will represent 
approximately 1500 gates. 

4.3. 2.2. 3.3 The translator shall translate the logic diagrams captured via 
the graphics system to produce the gate level tables required by the gate 
level emulation algorithm on the emulation computer. Each functional block, 
corresponding to the functional blocks of the functional level simulation 
model, shall be in a separate table, identifiable with the corresponding 
functional level block. 

4.3.2.2.3.4 A language translator shall also be provided which allows 
description of gates and their interconnections in a purely textual manner. 

The output of this translator shall be identical to and compatible with the 
graphics input translator. 

4.3. 2.2.4 Instruction Level Emulation Translator 

4. 3. 2. 2-4-1 The instruction level emulation translator shall provide the same 
functions as the functional model translator specified in 4. 3. 2. 2. 2 except 
that the executable code to which the description is translated shall be code 
for the emulation computer rather than the general purpose computer. 

4.3.2. 2. 4. 2 The instruction level emulation translator shall use a hardware 
description language but there is no requirement for partitioning into 
functional blocks. The translator shall provide the interfaces to the 
environmental model simulation running in the general purpose computer. 

4.3. 2. 2.4. 3 The instruction level emulation translator shall provide a code 
generation description output which may be input to a retargetable software 
translator such as a compiler or assembler which will translate the software 
to drive the described system. The description shall provide instruction 
formats, machine code descriptions and any other data necessary* 

4.3. 2. 3 Link Software. Any software necessary to link the various models and 
allow them to communicate shall be provided. 

4.3. 2.4 Model Debug Packages 

4. 3. 2. 4.1 Debug packages shall be provided for each type of model. 

4. 3. 2. 4. 2 The debug packages shall support interactive control and display of 
actual system parameters and modeled system components. 
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4<3«2>4«2.1 The debug packages shall support control of each model including 
as a minimum 

1) Start 

2) Stop 

3) Single Step 

4) Trace 

5) Breakpoints (minimum of 16) 

6) Item value change trace 

7) Continue after break 

8) Interactive modification of values 

4. 3. 2. 4. 2. 2 The debug package shall support display of both actual and 
modeled systems items including as a minimum: 

1) memory 

2) registers 

3) emulated gates 
\ 

4) external inputs and outputs (environmental simulation) 

5) processor state 

6) time 

4.3. 2.4.2. 2.1 The items to be displayed shall be specifiable by the operator 
including display device and format. 

4. 3. 2. 4. 2. 2. 2 Items shall be capable of being tagged for display in response 
to system events such as: 

1) breakpoint 

2) interrupt 

3) user command 

4) trace 

5) single step 
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4.3.3 Test Generation Software 

Software for developing test scenarios and fault insertion shall be provided. 

4. 3. 3.1 Test Scenario Software 

4. 3. 3. 1.1 The test scenario software shall include the capability to specify 
a sequence of runs, perhaps with differing parameters, which will subsequently 
be run automatically by the system. 

4. 3. 3. 1.2 The test scenario software shall include the capability to specify 
initial values of all external inputs, simulated devices and internal state, 
including time, of the test system. This shall include the capability to load 
a configured system which has been previously stored on a storage device. 

4. 3.3.1. 3 The test scenario software shall include the capability to specify 
specific data to be collected, the format of the data and the event in 
response to which the data shall be recorded. 

4. 3. 3. 2 Fault Insertion Generation 

4. 3. 3. 2.1 The fault insertion generation software shall provide the 
capability for either automated or manual generation of faults to be inserted 
in the gate level emulation. 

4. 3. 3. 2. 2 The fault insertion generation software shall produce the following 
information concerning each fault to be inserted: 

1) Gate identifier to receive the fault 

2) The simulation/emulation run time at which the fault is to be applied 
in terms of sample number and fraction of time within the sample. 

3) The duration of the fault 

4) The fault state that is to be introduced, i.e., steady zero state, 
steady one state, intermittent zero state, or intermittent one state, 
or alternating between the zero and one. state. 

4. 3. 3. 2. 3 The manual fault generation software shall allow the analyst to 
specify all of the factors for each fault as given in 4. 3. 3. 2. 2. 

4. 3. 3. 2. 4 The automated fault generation software shall allow the analyst to 
specify the following: 

1) Number of faults to be generated 

2) Specific system portion of interest or probability distribution of 
faults across the system 

3) Probability distribution of faults over time for each sample 

4) Probability distribution of type and duration of faults. 
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4. 3. 3. 2. 4.1 The automated fault generation software shall produce the data 
specified in 4. 3. 3. 2. 2 through the use of random number generators to provide 
the desired distributions. 

4. 3. 3. 2. 5 The fault generation software shall produce the information 
specified in 4. 3. 3. 2. 2 in such a way that the simulation execution system will 
use it to insert faults at the specified time in the specified sample in the 
gate level emulation. 

4. 3. 3. 3 Test Driver Software Generation 

4. 3. 3. 3.1 The capability shall be provided for translating software for the 
target (emulated /simulated) machine on the general purpose computer. The 
translated software shall be used to drive the test system during the 
execution phase. 

4. 3. 3. 3. 2 It is highly desirable that the translation process be entirely 
automated, taking as input the source code and a description of the machine 
for which code is to be generated and then producing object code for that 
machine. Translators which operate in this mode are often referred to as meta 
assemblers or meta compilers* 

4. 3. 3. 3. 3 It is highly desirable to have a meta compiler delivered to satisfy 
this requirement. However, given the current state-of-the-art, a meta 
compiler with retargetable code generator is not available. As a minimum a 
meta assembler is required. 

4. 3. 3. 3. 3.1 The meta assembler will have as one input a description of the 
instruction formats, operation codes and addressing modes of the machine for 
which code is to be generated. This input shall be produced from the hardware 
description language specified in 4. 3. 2. 2. 2, augmented as necessary for this 
particular task. 

4. 3. 3. 3. 3. 2 If a meta compiler is proposed, it shall have the same 
requirements as specified for the meta assembler in 4. 3. 3. 3. 3.1. 

4.3.4 Test Execution Software 

Software for executing the given test shall be provided. This software shall 
provide for the control of all the simulation and emulation models during a 
test. 


4. 3. 4.1 The execution software shall provide for coordination of timing 
between the various models so that they are all synchronized in relation to 
simulated time. 

4. 3.4. 2 The execution software shall also coordinate the execution of the 
simulations and emulations in the following manner for tests in which fault 
insertion is used. 



4.3.4.2.1 The execution software shall execute the functional level model and 
the environmental model until the time at which a fault is to be inserted. 

4.3.4.2.2 When the fault is to be inserted, the execution software shall 
cause the execution of the functional block in which the fault is inserted to 
switch from the functional level simulation to the gate level emulation with 
the inserted fault. The balance of the simulated system, without the faults, 
will continue at the functional level. 

4.3.4. 3 The execution software shall provide for the data recording specified 
under the test scenario software in reaction to the events specified. 

4. 3. 4. 4 For data reliability analysis collection, the following data 
collection shall be provided. 

4. 3. 4. 4.1 At the start of a sample, sample number, initial conditions 
(external inputs and outputs and internal system state) and all other 
pertinent information shall be recorded. 

4. 3. 4. 4. 2 At the time of insertion of a fault, the sample number, system 
inputs and outputs and internal system state, and all the information 
concerning the fault shall be recorded. 

4. 3. 4.4. 3 At any time during the run, whenever any of the inputs or outputs 
exceed the limits specified under the test scenario software, the sample 
nximber, simulated time, inputs, outputs, internal system state, and value out 
of limits shall be recorded. 

4.3*4.4.4 As the test execution software may very well run for hours or even 
days, it is absolutely mandatory that automatic check-point restart capability 
be provided. 

4.3.4. 5 The execution software shall operate without user intervention but 
shall allow the user to stop the execution and save the system state for later 
re load • 

4. 3. 4. 6 The execution software shall provide support for all of the model 
debug packages specified in 4.3. 2.4. It shall allow execution of the total 
system in debug mode. 

4. 3.4. 7 The execution software shall include the gate level emulation 
algorithm. 

4.3.4. 7.1 The gate level emulation algorithm shall be table driven, using the 
gate tables produced by the translator specified in paragraph 4. 3. 2. 2. 3. 

4.3.4. 7. 2 The gate level emulation algorithm shall be able to emulate at 
least a 6000 gate system. 

4. 3. 4.7. 2.1 The maximum slow down factor for the algorithm operating on a 
6000 gate system, assuming an average gate fan out of 2, 5% of gates changing 
value in any one emulated machine cycle, shall he 3000 times slower than real 
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time. This timing should be based on a 1 microsecond emulated machine cycle 
time. 


4. 3. 4. 7. 2. 1.1 This requirement is based on the results of the feasibility 
study (Attachment 1). 

4.3.5 Analysis Software 

Software to support the reduction of the data gathered during test execution 
shall be provided. This software shall provide for data reduction, 
statistical analyses and reliability modeling. 

4. 3. 5.1 The data reduction software shall allow the analyst to group common 
data and reduce it to necessary components, using the data recorded during the 
test execution phase. It shall allow the extraction of items deemed important 
for a particular use on an individual basis by the analyst. 

4. 3. 5. 2 The statistical analysis software shall provide the capability to 
calculate statistical parameters from the reduced data produced above. The 
following capabilities shall be provided at a minimum! 

1) Matrix manipulation 
Real Matrices 
Complete Matrices 

Eigen values. Eigen vectors 

2) Ordinary differential equations 

3) Regression analysis 

4) Time series analysis 

5) Variance analysis 

6) Interpolation 

7) Numerical integration 

8) Differentiations 

9) Polynomial manipulations 

4. 3. 5. 3 The reliability modeling software shall provide the capability to 
develop parameterized, unified reliability models of the system of interest. 

4. 3. 5. 3.1 The reliability model shall be capable of using the statistical 
data produced from actual test execution as an input in place of predicted or 
expected paramters. 
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4.3.5«3.2 The reliability model software shall suppport development of models 
for fault tolerant, multiple processor systems. 
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This trade study was done to determine the best approach to the hosting of 
the various pieces of software needed by the digitial avionics design and 
reliability analyzer. The trade study was designed to answer the question: 
"should the facility be based on an emulator-only system or emulator/support 
machine system?". The emulator/support system envisions an emulation machine 
connected to general purpose computer. The general purpose computer supports 
most of the software, with the emulator supporting only actual emulations. In 
the emulator-only system, the emulator must support everything. 

II. Methodology 

To perform the trade study the following 5 criteria were established: 

1) Operation speed 

2) User interface 

3) Difficulty of use 

4) Cost of implementation 

5) Size of facility needed 

These criteria were then rank ordered in order of importance (as shown in 
the list above) and assigned weights of 5 to 1 with 5 being the most important 
(operation speed). 

Each alternative was then evaluated for its satisfaction of each criteria 
on a scale of 1 to 10 with 10 being most satisfactory and 1 the least. We 
then multiplied the satisfaction by the criterion weighting to obtain the 
weighted ranking. Weighted rankings for each criterion were then added to 
give a total for each alternative, with the higher score reflecting the "best" 
choice. The results are shown in Table A-1. 

Table A-1 Hardware Composition Trade Study 


C 

o 
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III. Evaluation 


1) Operation Speed - The feasibility study indicated that the primary 
limiting factor for the avionics design and reliability analyzer in 
the gate level emulation mode is the speed of the gate level 
emulation. In the emulator/ support case, the support computer 
removes the burden for support of environmental simulation etc., from 
the emulator. Thus this combination rates 6. A more parallel system 
could get a higher score. The emulator only system rates a 3. 

2) User Interface - The user interface is one of the key items in the 
use of the facility. If this interface is poor, there will be a 
reluctance to use the system. General purpose support machines have 
the user interface as one of their most visible portions and hence, 
modern operating systems have attempted to provide for a flexible 
interface. Emulators, on the other hand have a much narrower 
applicability and hence less attention is paid to such "mundane" 
factors. 

3) Difficulty of Use - This relates not only to the user interface, but 
also to the operating system backing it up. The emulator/support 
combination can use the general purpose operating system to hide the 
tedious details of interaction with the emulation machine which is 
attached to it. In the emulator-only case, the user usually has to 
explicitly deal with the details of the emulation machine. 

4) Cost of Implementation - The emulator/ support system represents an 
increase in hardware cost over the emulator-only system. Comparable 
software needs to be developed in both cases, with the exception of 
the additional driver software necessitated by the emulator/ support 
interface. 

5) Size of Facility Needed - There is no clear indicator that either 
choice represents a better possibility here. 

IV. Final Recommendation 

Based on the established criteria, the emulator/ support system is 
recommended. 
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!• Introduction 

The following trade study was done to determine which microprogrammable 
computers would best serve as the emulator portion of the digital avionics 
design and reliability analyzer. First a search was done to find all 
available user-microprogrammable machines. These were then analyzed to 
determine which ones met the requirements for implementing the NASA Langley 
gate-level emulation algorithm. The machines which met the requirements were 
then compared concerning performance and price. A select few were recommended 
as candidates for the emulator portion of the digital avionics design and 
reliability analyzer. 


II. Microprogrammable Computer Architecture 

A microprogrammable computer is one whose microcode can be changed by the 
user. Microcode, which is stored in control store, consists of 
microinstructions which control the primitive operations of the computer. A 
complex operation performed by a computer can be represented as a sequence of 
microoperations. There are three types of microinstructions: vertical, 

allowing one operation per instruction; diagonal, allowing one or more; and 
horizontal, allowing many operations per instruction, thus increasing 
processing speed. 
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III. Requirements 

There are a number of requirements that must be met by a user- 
microprogrammabla computer in order to implement the NASA Langley gate-level 
algorithm. These requirements are based on a feasibility study implementation 
of the algorithm using the Nanodata QM/1 computer. 

1. The microcode controlling the machine shall be user-programmable 
through software. 

2. The microcode shall provide for parallel operations within a single 
microword • 

3. Control shall be directed from main store via a gate info word of at 
least 8 bits to micro store using a vector mechanism. The gate info 
word contains the address of the location in micro store to which 
control is directed. 

4. Control store shall contain at least one thousand words. 

5. Main memory shqll be sufficient to handle the algorithm, at least 32 

thousand words. 






IV. Computer Search 

Various references were investigated in order to find names of all 
companies manufacturing minicomputers which are microprogrammed. The 
following sources were used; Auerbach Publishers, Inc., Data Pro Information 
Services, Electronic Buyers' Guide 1980, NASA Recon Data Base (remote 
console), Defense Technical Information Center, and the Lockheed DIALOG data 
base. This search resulted in the list of companies shown in Table B-1. Each 
company was then contacted and asked which, if any, of their minicomputers 
were user microprogrammable and could function as emulators. The list of 
computers shown in Table B-2 resulted from these inquiries. 


B-3 


J 



Table B-l. Computer Manufacturers Surveyed 


MANUFACTURER 


COMMENT 


1) Burroughs 

2) Cado Systems 

3) Control Data Corp. 

4) Data General 

5) Digital Equipment Corp. 

6) Digital Scientific 

7) Hewlett Packard 

8) Honeywell 

9) Nanodata Corp. 

10) Northrop Data Sys. 

11) Microdata 

12) Ohio Scientific 

13) Perk in-Elmer 

14) Prime Computer Inc. 

15) Rolm 

16) Sperry Univac 

17) Systems Engineering Lab 


See Table B-2 

Word length limited to 8 bits 

See Table B-2 

See Table B-2 

See Table B-2 

See Table B-2 

See Table B-2 

See Table B-2 

See Table B-2 

Nothing user microprogrammable 
Nothing user microprogrammable 
Nothing user microprogrammable 
See Table B-2 

Only limited information available 
Machine too small; C.S. too small 
See Table B-2 
See Table B-2 
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TABLE B-2. MICROPROGRAMMABLE COMPUTER CHARACTERISTICS 




V. Computer Performance Analyses 

Each computer was then examined to determine whether or not it would meet 
the requirements determined by the study done on the QM/1 using the NASA 
Langley Research Center gate-level emulation algorithm. All machines met both 
the 32K main store requirement as well as the IK control store requirement. 

The following analyses discuss the operation of each machine in relation to 
Requirements 1, 2, and 3. 

1.0 Burroughs BI80Q or B1900 Series 

In this computer cache metnory (2K words) is used as control store, it is 
possible to store all of the microcode in the cache memory. A pipelined 
processor permits fetching, decoding, and executing microinstructions to be 
performed separately and concurrently thus compensating for the limited 
capability of the 16-bit microcode. Memory addressing at the hardware and 
microcode level is accomplished through a 24-bit field address register that 
can directly address 16,777,215 bits as though they were a continuous string. 
Up to 24 bits can be processed in one operation taking 167 ns. Optional port 
interchange enables independent rather than processor-dejjendent access to main 
store by such devices as the multi-line data communications control. The 
18-bit A register contains the absolute "S" or Main Memory address of the 
microinstruction to be executed. 

This machine would be a suitable candidate. 

2.0 Control Data Cyber 18 

The CDC Cyber 18 was designed to emulate the CDC 1700 Series. The 
microprocessor contains 2K to 4K of 32-bit user programmable microcode. One 
type of micro memory consists of 512 words of read/write memory and/or IK 
words of read only memory; the other type contains 2K of read/write memory. 
Each 32-bit microinstruction is divided into five main sections each 
performing a different operation in parallel with the others. The 
microprocessor controls the machine at all times. The process of decoding a 
macroword in main store determines the address of the micro routine which is 
called. 
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The read/write random access memory (RAM-) can either be loaded from an 
external device or data can be written into micro memory under control of the 
micro program. 

Since this machine does have sufficient control store of parallel 
microcode and uses a vector mechanism to transfer control from main store to 
micro store it would be a candidate. 

3.0 Data General Eclipse 

The control store of the Data General Eclipse contains 2K 56-bit words of 
parallel microcode. Each microinstruction is divided into 15 micro fields 
which can be grouped according to the purpose they serve. A word in control 
store is addressed by the 12-bit output of the state change logic which is 
determined by the contents of the True Address bus or the False address 
field. In order to start main memory the CPU places an address on the logical 
address (LA) bus and issues a start signal to memory. Only the module 
containing the nemory location addressed responds to the signal. 

There is a microas scnib ler available to enable the user to write 
microprograms in symbolic form and assemble them to produce a binary object 
file. The microloader is then used to load the object files. 

This machine contains the vector mechanism to address the microcode and 
has flexible ) parallel microcode so it would be a candidate. 
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4.0 DEC - VAX 11/750 

The VAX 11/750 contains 6K of 80-bit microcode. A single microinstruction 
can perform many operations in parallel. The VAX 11/750 was designed as an 
emulator for the VAX architecture and contains IK of user control store. 
Emulation starts with one micro-'>rder called the BUT/IRDl. This signals the 
beginning of the next VAX machine instruction. In the micro-code which 
emulates each VAX instruction, this micro-order is present in the last 
microinstruction. Access to the user control store is by the opcode called 
"FC" in the VAX instruction stream. This opcode results in a branch to a 
location in user control store. From this point on, user microcode has 
control of the micromachine. Control can then be returned to the VAX 
emulation by means of the BUT/IRDl micro— order • 

There are a number of features which support user microprogramming'; the 
data path which includes 18 general purpose 32-bit scratch pad registers, 8 of 
which have ports to both the RBUS and the MBUS; the super rotator, which 
allows very efficient (in hardware) bit picking operations; and a flexible 
ALU. The microsequencer supports general microprogramming in three important 
ways; conditional branching, loop control, and subroutine control. The VAX 
11/750 has six independent flag bits, four of which are always available for 
user microprogramming and two of which are conditionally available. There is 
a 5-bit step counter which can be initialized to any arbitrary value 
(0 ^ X ^ 30), For subroutine control, a 16— deep microstack is available for 
nested subroutine calls. 

This computer does have the required horizontal microcode as well as an 
opcode resulting in a branch to user control store so would be a good 
candidate . 

5.0 DEC - VAX 11/780 

The VAX 11/780 contains IK of 96-bit user control store which is available 
primarily for augmenting the speed and power of the basic machine. It is, 
however, possible to access 4K of ROM containing the operation and sequencing 
of the central processing unit. The architecture and operation of the VAX 
11/780 is similar to the VAX 11/750 as far as the requirements of this 

contract are concerned. 

This machine would be a good candidate. 
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6.0 Digital Scientific META 4 

This machine is designed to be an adjunct processor to a main CPU. One 
possible application is as an I/O processor. The microcode instruction set is 
very structured, 32 bits long. Typical predefined instructions include load 
from control store, move register to register, etc. Control store size is 
limited. Microcode can also read from **main store" via a request, wait 
protocol. 

Microcode operation is not started via a vectored operation and in 
general, this "microprogrammable" machine is typical of a mini computer 
without microprograramability . 

This computer will not provide the capabilities necessary for our 
purposes. Microcode execution is not started via an opcode type operation, 
necessitating bit decoding in the implementation of the algorithm. The 
machine does not have ready access to the larger main store which would be 
necessary to hold gate tables in the algorithm. Finally, the instruction set 
looks like a mini computer instruction se^: and is not flexible enough to do 


^ W ^ ^ M ^ « 1 ^ ^ ^ r.TA A ^ 


7.0 Hewlett Packard 1000 E/F Series 

The HP 1000 E/F Series has 50K of user addressable 24-bit microcode in 
control store with access to 12 scratch pad registers. There are four word 
types of microcode with up to five micro-orders each. Each micro-order 
defines one or more operations to be performed by the computer. 

The control processor, part of the CPU, is always in control of the 
computer, and the base set microroutines cause the read operations to occur 
for all instructions and data from main memory. All 16-bit instructions are 
placed in the Instruction Register (IR) and decoded. The process of decoding 
the IR bits determines which control memory address (which microprogram) is 
called by the instruction received from main memory. Control memory module 
selection is determined by the value of bits 8 through 4 in the Instruction 
Register. These bits help determine the address of branches in the control 
memory base set Primary Mapping Table, which in turn directs a branch to the 
desired module. 
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There is s micro programming support software package consisting of the 
following: 

. RTE Microassembler Program 

. RTE Microassembler Cross-Reference Generator Program 
. RTE Microdebug Editor Program 
. RTE Microdebug Editor Subroutine 
. RTE Driver DVR36 
. WCS I/O Utility Routine WLOAD 
. PROM Tape Generator Program 

The microcode may be loaded into writable control store (WCS) modules or 
may be permanently fused in programmable read-only memory (PROM) chips. 

This machine contains horizontal microcode as well as the necessary vector 
mechanism so would be a good candidate. 


3.0 Honeywell Level 6 

The Honeywell Level 6 contains up to 2K 64-bit words in its writable 
control store. Each 64-bit word is divided into four 16-bit segments each of 
which can be loaded with a separate instruction. Thus, one word may perform 
four parallel operations. Control is transferred from the CPU to the writable 
control store by causing the CPU to issue a megabus cycle (I/O write) 
addressed to the WCS. This operation is performed by the native firmware 
whenever the first word of an instruction lies in the range 0080 hexadecimal 
through OOBF hexadecimal (64 bits). The location to which control is 
transferred is one of the first 16 locations in the WCS^ the specific location 
is identified by the least significant hexadecimal digit of the instruction 
word. 

There is a WCS assembler available to assemble firmware routines as well 
as a loader to load the assembled routines into the WCS. A microcode analyzer 
is available to selectively display pertinent CPU and WCS information for 
debugging microprograms. 

Due to the horizontal microcode and the vectoring effect transferring 
control from the CPU to the microcode, this machine would be a candidate. 
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9.0 Nanodata QM/1 

The Nanodata QM/1 is unique in that it is specifically designed to emulate 
other computers. There are two levels of microprogramming with the lower 
level called nanoprogramming. The top level microprogram is an 18-bit 
vertical microcode having many of the characteristics of an assembly 
language. The lowest level microcode is a 360-bit horizontal word (144 bits 
of which are active at any one time) which interprets the higher level 
microcode. The identification of the nanoword which interprets a given 
microinstruction is determined by 7 bits in the 18-bit microword itself and a 
3 bit page indicator in a CPU store register, giving a total of 10 bits of 
address to cover the 1024 words of nanostore. 

The control store limit is 40K words. For the Langley algorithm, the 
algorithm would be coded in nanocode, using control store to provide the 
vector into the proper nanoword and to hold the gate state information. Based 
on its architecture and the actual implementation of the Langley algorithm for 
the QM/1 under the feasibility study, the QM/1 is a suitable candidate. 

10.0 Perkin Elmer 3320 

The Perkin Elmer 3320 contains 2K 32-bit words of writable control store. 
WCS is addressable through ROM location counter (RLC). There are four 
assembly level instructions which enable the user to write into WCS, read from 
WCS, and transfer control to WCS resident microcode. Unfortunately the 2K 
words of the WCS serve as a supplement to the fixed control store; the user 
cannot delete or modify user level instructions or machine features located in 
the ROM control store. If an operation does not exist in ROM, it cannot be 
used in WCS. A new emulator cannot be created in WCS; the user can only add 
to the existing one. For this reason, this machine would not be a suitable 
candidate. 

11.0 Sperry Univac V77-800 

The Sperry Univac contains 2K of 48 bit microcode in writable control 
store (WCS) with space for IK 48-bit ROM storage. Each microword executes 
multiple operations. The WCS acts as an extension of the processor control 


store. 


The WCS contains a decoder control store, a central control store (CCS), 
and an I/O control store. The decoder control store consists of two 16-word 
by 16-bit memory arrays with associated logic that decodes main memory . 
instructions into a 9-bit address which is applied to the CCS. Addressing for 
the 64-bit microinstruction is provided by the 9-bit address from either the 
processor, decoder control store, or subroutine stack. 

The microcode is input as a series of source statements via a terminal or 
card reader using the operating system VORTEX II or SUMMIT. The 
Microassembler, MIDAS, is then used to transform these statements to object 
code. The object code is then loaded into WCS using the microutility, MIUTIL. 

This machine does contain horizontal microcode as well as the necessary 
vector mechanism to control store and would be a viable candidate. 

The architecture of the V77-600 is the same except that there are 4K 64 
bits of WCS. Thus this machine would also be a candidate. 

12.0 Systems Engineering Laboratories 32/70 Series 

The SEL 32 Series contains 4K 64 bit high speed Random Access Memory (RAM) 
as a physical extension of Control Store (CROM). The microinstructions 
contained in WCS allow parallel operations within the execution timing of a 
single instruction. 

The writable control store (WCS) may be used as a CROM extension in the 
host computer, or it may be used with the Development Support System (DSS), 
residing in the DSS Test Stand. The CROM takes an instruction from Main 
Memory and stores it in a 32-bit internal register (II). An appropriate 
microprogram is executed and the contents of register II are moved to register 
10 (a 32-bit register). The CROM entry point is determined by a decode of the 
contents of register 10. The CROM contains a series of read only memories 
(ROMs) which contain the decode and vector tables within CROM to the 
microprogrammed routines that operate the computer. 

Entry into the WCS from software is accomplished using the JUMP WCS 
Macro-Assembler instruction. This instruction allows the user to jump to any 
of the first 64 locations in WCS where vector addresses (in microcode) are 
stored, which address routines within the WCS. 

The writing of WCS is accomplished using the WRITE WCS Macro-Instruction. 
The reading of WCS is accomplished using the READ WCS Macro-Instruction. 
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' since this machine does have horizontal microcode and does have the vector 
mechanism from main memory to control store it would be a suitable candidate. 


In addition to the analyses that were done to determine whether each 
computer met the requirements, an algorithm was used to rank the computers 
with respect to those characteristics necessary to the solution of the 
gate-level algorithm^. The following equation was examined then altered to 
better fit the algorithm requirements: 

10^^ C(L-7) (T) (WF)] ^ 

P= [32,000 (36-7)] ^ 


where 

P- the computing power in bits per second 
L= the word length in bits 
T= the total number of words in memory 
WF* 1 for fixed word length memory 

2 for variable word length memory 
tc= the time in microseconds for the CPU to perform one million 
operations 

t^^Q= the time the CPU sits idle waiting for I/O to take place 
i- .5 


1 Knight, Kenneth E.: Changes in Computer Performance, Datamation , vol. 12, 

no. 9, pp. 40-54, September 1966. 
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The above equation was altered to include only those parameters relevant to 
the implementation of the NASA LRC gate-level emulation: 


C(L-7) (1) (1)1 ^^^ 

p'= K32,000) (36-7)] ^^^ 

[2 (CS) + (m)~ 

3 


where 

CS- control store cycle time in microseconds 

M= main memory cycle time in microseconds 

P= a measure of the bits processed based on a weighted average cycle time 

A weighted average of the control store cycle time and the memory cycle time 
was chosen as the control store is accessed more frequently than the main 
memory so its access time should carry more weight in analyses of the overall 
performance. 

Xhe measurement P* is not meant to be a direct measurement of the power of 
each machine but more of a relative measurement of performance to aid in 
choosing the computer which best fills the requirements of this contract. 

The value for P* for each computer was then scaled to fall between 1 and 
100 in order to more easily rank the performances. These numbers appear in 
Table A-2 under the heading ''Performance Rating". The value for the 
performance rating was then divided by the cost of the CPU with minimum memory 
(at least 32K words) plus control store to give a value for performance per 
dollar. These values were then scaled to fall between 1 and 100 to give each 
candidate a "Performance per Cost Rating". 

The prices quoted in Table B-2 represent only the price of the CPU with a 
minimum of memory plus the control store. They do not reflect the price of 
interfaces, consoles, printers, etc. They should be used only as a general 
basis for cost comparison. 
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VI. Conclusion 

The machines which have been recommended as final candidates were chosen 
more on a basis of performance than cost due to the stringent requirements for 
supporting the gate-level algorithm. 

The performance of the QM/1 is far superior to any of the other machines 
studied. There are a number of machines that compete for second place such as 
the DEC VAX 11/750, DEC VAX 11/780, Honeywell Level 6 Model 43, Systems 
Engineering Lab 32 Series, Data General Eclrose, Sperry Univac V77-600, and 
Sperry Univac V77-800. Comparing the two VAX machines, one would eliminate 
the VAX 11/780 on basis of cost. The Univac V77-800 could be eliminated for 
the same reason. Any of the following machines would be good second choices: 

1) DEC VAX 11/750 

2) Honeywell Level 6 Model 43 

3) SEL 32 Series 

4) Univac V77-600 

5) Data General 

The QM/1 far outperforms those machines in second place and would be the 
recommended choice for the emulator portion of the Digital Avonics Design and 
Reliability Analyzer. 
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I. 


INTRODUCTION 


This interim technical report details the results of Martin 
Marietta's implementation on the Nanodata QM/1 of an algorithm for the 
emulation of digital devices at the gate level. The implementation is 
intended to prove the feasibility of using emulation technology for 
data collection in support of reliability studies of fault tolerant 
digital avionics equipment. From the high level point of view, it 
is clear that that feasibility depends primarily on the adequacy of the 
speed improvements emulation seems to offer over simulation. That is 
to say, the most useful measure of the feasibility is the time required 
to perform a "sufficient" number of experimental runs to give statistical 
significance to the results obtained. 

The specific algorithm which we implemented was developed by the 
NASA Langley Research Center. The algorithm has two significant factors 
Inherent in its use. First, it doesn't require examination of every gate 
in the system and second, it allows treatment of every gate in the same 
manner, regardless of gate type. The algorithm is described in detail 
in sections III, IV, V and Appendix A. 

To provide a basis for the actual timing figures, Section III 
provides a discussion of the basic operations involved in the algorithm 
and the basic timing considerations in the QM/1 to try to determine a 
"best case" slow-down factor for the algorithm (i.e., an indication of 
the best we can do in terms of speed). This section contains a brief, 
high level overview of the philosophy of the algorithm and provides 
an introduction to the more complex discussion in Section IV. 

Section V gives details of our implementation including considerations 



of memory requirements and system size. The timing results are detailed 
in Section VI and include graphs to determine predicted performance 
for systems of varying sizes. Section VII presents our conclusions 
based on the implementation and timing studies. Appendix A contains 
the rationale and basis for uniformity of gate treatment. Appendix B 
contains the derivation of the timing equations used for projection, 
and Appendix C contains the nanocode for the "best case" timing 


analysis . 


II. SUMMARY 


This report details the results and conclusions of Martin Marietta's 
implementation of the NASA Langley Research Center's gate level emulation 
algorithm on the Nanodata QM/1 computer. The implementation was done 
to determine the applicability of emulation technology to reliability 
analysis of digital avionics systems. This determination has focused 
primarily on the speed aspects of the emulation and the time necessary 
to run a sufficient number of sample cases to provide significant results. 

The slow-down factor of the emulation is based on four primary 
considerations: 

1. The system size (number of gates): 

2. The average percentage of gates changing value in a 
machine cycle; 

3. The average fan-out of the gates in the system; 

4. The machine cycle time of the system under study. 

Using a system size of 2000 gates, and assuming 5% of the gates change 
value, with an average gate fan-out of 2.0, and a machine cycle time 
of ^s, the actual slow down factor based on the implementation was 
found to be 1200:1. This is compared to a best possible slow-down 
of 600:1. The 1200:1 figure means that 10,000 samples of 0.1 seconds 
real time per sample would take 17.2 days of emulation processing, 

a span which is entirely reasonable in relative to the kinds of numbers 
seen .in previous studies (i.e.[3j)« 

The largest control-store resident system possible under the same 
constraints (6000 gates) also exhibits a reasonable slow-down factor 



of 3500:1. However, in attempting to extend the emulation capability 
beyond 6000 gates, we found that the processing time is overshadowed 
by the time it takes to load data into control-store, and hence this 
mode of operation is not feasible. 

The basic conclusion of the report is that gate level emulations 
of systems up to 6000 gates is feasible within the constraints imposed 
by the architecture of the QM/1. 


III. BASIC ALGORITHM DESCRIPTION WITH PRELIMINARY TIMING ESTIMATE 

In our discussions with NASA Langley Research Center Personnel, 
we have been given several estimates of slow-down factor expected by 
them in the QM/1 implementation of their algorithm. These have ranged 
from a low of 300:1 to a higher range of 500-600:1. From our imple- 
mentation of the algorithm, these figures seemed very optimistic. The 
following discussion is an attempt to define a possible, reasonable lower 
bound on the slow-down factor, taking into account QM/1 and nanocode 
realities as well as the operations necessary because of the algorithm. 

For the analysis which follows, we assume that the reader is 
familiar with the basics of the QM/1 and its nanocode. We further 
assume that the reader is familiar with the algorithm implemented 
(described briefly below) . Please note that in these assumptions we 
do not require a working knowledge of either item, only a familiarity 
to the extent that allows an understanding of the terms involved. For 
example, most of our discussion will be based on the basic time cycle 
in the QM/1, the T period (80 ns). It is sufficient for the 
reader to realize what a T-period is and what it means in relation to 
execution of a nanoword. 

The NASA LRC algorithm is conceptually straightforward. There is 
one item requiring acceptance on the reader's part. The algorithm as 
defined allows all gates of any type (AND, OR, NAI^D, N0R,X0R, etc.) to 
be treated identically after initialization of the value of the gate 

and a quantity called CNT (count) which relates to the number of Inputs. 
The algorithm has a major and a4ninor loop as shown in Figure III-l. 
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The driver on the, major loop is a queue consisting of those gates whose 
value has changed. The minor loop is comprised of examining each gate 
which is connected to the output of the changed gate to see if the 
change affects the value of the output gate. For example, we will use 
the 3 gates shown in Figure III-2. Suppose the value of gate A has 
changed (the mechanism for this is described later). The algorithm 



then specifies that gates B and C must be examined to determine if the 
change in the value of A will cause a change in the value of either 
B or C. This is determined by updating the-€NT quantity for the output 
gate (B or C) based on the new value of the input (A) and determining 
if CNT, by virtue of that update, transitions into or out of zero (see 
Appendix A for an explanation of this mechanism) . Transition of CNT 
for the output gate into or out of zero indicates that the value of the 
output gate (B or C) changes value. If the value of the output gate (B 
or C) does not change because of the input (A) value change, no further 
action is necessary. If the value of the output gate (B or C) does 
change, the gate is added to the next cycle's changed-value queue (as a 
future x-gate) to allow examination of the effects on its outputs. 

For brevity in our discussion, we will term the changed-value 
queue and its processing "the x queue" and "x processing". We will 
similarly term the output gate processing to be "z processing". The 
action of the algorithm then causes a z whose value changes to become 
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an X for the next "cycle" (for its outputs to be examined) . In this 
case, a "cycle" represents the propagation of the signal through one 
logic level; and a "machine cycle" would be completed when the x queue 
becomes empty (i.e., when the logic circuit has reacted to changed 
inputs and the circuit has settled to quiescent values) • 

There are several benefits derived from the algorithm. For x 
processing, only those gates whose values change need to be examined. 

Further, only the outputs of those gates must be known. This contrasts 
with the typical "brute force" algorithm which requires examination of 
each gate and subsequent examination of all of its inputs to determine 
its value. 

For the analysis below, the only way a gate is put on the x queue 
is by virtue of its having been examined during z processing and found 
to have changed value. We will term this the non-null case of z proces- 
sing. Thus we have the following relationship 

# X gates processed. = # non-null z gates processed ( 1 ) 

This ignores the mechanism for starting the cycle, so we will cover 
that later. The null z process is the other case for z’s and repre- 
sents basically no operation (i.e., no action necessary since the 
gate doesn’t change value). 

To provide some quantitative values for our discussion we need to 
make some assumptions concerning the system being modeled at the gate 
level. For example, we need to fix the size of the system. This is due 
to the fact that the slow-down factor is directly proportional to the system 
size (actually the proportionality is based on the number of gates whose 
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values change, but this is related to system size). Gate proces- 
sing must proceed sequentially, while the modeled machine cycle 
time is fixed. Therefore, we based our analysis on the following 
assumptions; 


1. The system under consideration contains 2000 gates ; 

2. Only 5% of the gates will change value in any machine 
cycle (x processing^ 

3. The average gate fan-out is 2 (there are 2 z^s per each x); 

4. The basic machine cycle time of the emulated system is 0.1/ 


For discussion purposes, any further reference to cycle, cycle time or 


uiHchiue cycle wrll refer to the nicdeled machxne cycle time and repr€ 


the time for the logic circuit in the modeled machine to react com- 
pletely to a change in input. This represents the real time against which 
the algorithm is measured. 


A second set of assumptions is necessary for this analysis. This 
second set relates to the data structure upon which the algorithm 
is built. For this discussion we will assume the following structure 
(shown pictorially in Figure II1-3) . 

1. Each gate is characterized by a gate info word. This word 
contains the gate value, CNT and various other information. 

2. The X queue is a linked list with the link word following the 
gate info word in memory. The link word contains the address 
of the next gate info word in the queue. We will call the 
link word "LINK". 






















11 


3. The address of the first gate info word in the queue is 
kept in a local store register designated MLINK. 

4. The output gate addresses are kept in a separate section 
of control store in consecutive order. That is, the 
address of the gate info word for output 2 of gate y follows 
immediately the address of the gate info word for output 1 
of gate y. 

5. The address of the output list for each gate is contained in 
a word following the queue link word (LINK) . We will call 
this word "CLINK." 

Ppem. the diagram of Figure III-3, we see the gate whose info word is 

located at address nri-9 has two outputs. The first output is the gate 

whose info word is located at address mfl5 and the second is the gate 

whose info word is located at address m+-6. This is found by following 

the CLINK to address n to address nri-15 and then following address n+1 
to address m+6. The x queue in that figure goes from the gate of 
address mfl8 (due to MLINK) to mfl2 (LINK) to m+9 (LINK) to m (LINK). 

The data structure just presented represents, in a somewhat simplified 
manner, the data structure used in our actual implementation. The 
specifics of the implemented algorithm are covered in more detail 
in Section V. 

Given these assumptions concerning system size, the definition 
of the algorithm and the structure of the data in QM/1 control store, 
we can now begin to analyze the potential slow-down factors based 
on those assumptions. One further assumption which is Inherent 
in the following analysis is that the decisions concerning 


12 


the actions to be taken in z processing are made in a highly parallel 
fashion using the QM/1 microinstruction execution feature [1:66] and 
local store register R31. Using this feature, testing of bits in the 
gate info word is done quickly in a highly parallel fashion requiring 
a very small amount of time. 

For the first cut estimate of slow-down, we will ignore the time 
necessary for x processing and concentrate on what is required for z 
processing. However, to determine the number of gates examined in 
z processing, we must use system assumptions 1 and 2 to determine the 
number of x's processed and then multiply that by the 2 from system 
assumption 3 to give us the total number of output gates examined 
(i.e., 2 outputs per x = total z's). Therefore we have the following: 

# X gates processed = (System size) X Qi system changing 

= 2000 gates X 5% 

= 100 gates (2) 

// z gates processed = // x gates processed X fan— out of x gates 

= 100 gates X 2 
= 200 gates 

Now using equation (1), we find the number of non— null z cases. 

Equation (1) stated 

# X gates processed = # non-null z gates processed (1) 

Therefore, we have 

# non-null z gates processed = # x gates processed 


= 100 gates 


(A) 
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Which means that of the 200 z gates we have, half are the non-null 
case and the other half are the null case. Now that we have fixed 
the amount of processing to be done, we need to get an estimate of 
the time necessary to do each case. We will consider the null case 
first. The shortest nanoword in the QM/1 which does not branch to 
itself looks like [1:58]: 

Tn: READ NS (not stretched) 

Tn+1: GATE NS (not stretched) 

or 

Tn: STRETCH, READ NS, GATE NS 

We need to explain our notation in the above two examples. The Tx to the 
left indicates the T-step (not T-period which is fixed, but T-step which may be 
either 1 or 2 T-periods long). In the first case, the T-steps are not 
stretched which means they are each 1 T-period long. In the second, 
the T-step is stretched indicating it is 2 T-periods long. For those 
more familiar with nanocode notation, this can be shown thus: 

X . . . READ NS 

.X . . GATE NS 

Q,r 

S . . . READ NS, GATE NS. 

The net result is that the null z processing requires, in the best 
possible case, 2 T-periods. 
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For the non-null case, we need to do more in the nanoword than 
simply branch out. Let us assume that we can do all necessary processing 
in one nanoword. To determine the length of that nanoword let us 
examine the length of a set of nanowords. The set of MULTI nanowords, 
consisting of 124 words, represent 669 T-periods. This works out to: 

T— periods /nanoword = 669 T-periods/ 124 nanowords 

=5.39 T- periods/nanoword (5) 

This result fits in well with intuition in which we realize that the 
case where none of the T-steps in a nanoword are stretched is relatively 
rare, and that in most words observed, at least one and occasionally 
2 of the T-steps are stretched. Thus, without considering the exact 
operations to be performed, we will use a 5.4 T-periods/nanoword figure 
for the non-null z processing nanoword. 

Now we have the information necessary to calculate the z processing 
and the absolute best case slow down factor. The time required is 
given by 

null* z processing = # null z gates X 2 T/ gate 
= 200 X 2T 
='200T; 

non-null z processing = if non-null z gates X 5.4 T/gate 

= 100 X 5.4T 
= 540T; 
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total z processing “ null z processing + non-null z processing 

» 200T + 540T 
= 740T. 

To translate this to understandable terms, we use the 80ns/T-period 
conversion to get: 

time = 740T X 80ns /T 

= 59200ns = 59.^ 

For our given machine cycle time 
the slow-down factor is given by: 

slow-down factor = actual time 

machine time 

= 59.2>s 

0 . 

= 592:1 

This 600:1 factor is close to the NASA LRC expected slow-down in 
their 500-600:1 estimate. What is significant about this figure is 
that our judgment about 600:1, which appeared to be an optimistic 
figure is proven to be true. This slow-down factor is based solely 
on z processing, does not include x processing at all, and in addition, 
does not include most of the processing necessary for z’s. 

Let us look further into the x processing. This processing 
must consist minimally of : 

1) Reading in the x gate info word from control store; 

2) Reading in the address of the output list (CLINK) from control 


of O.^s (system assumption 4) 


store; 
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3) Reading in the address of the first output gate (z); 

4) Reading in the gate info word of the first output gate (z); 

5) Reading in the address of the second output gate (z); 

6) Reading in the gate info word of the second output gate (z). 

Steps 3-6 depend on there being 2 outputs per gate. Referring back to 
Figure III-3, and using the gate whose info word is at m+9 as the x 
gate, step 1 reads the info word from address m+9 into a local store 
register. Step 2 reads the CLINK word at address m+11 into a local 
store register. This register contains the address n. Step 3 reads 
the contents of address n into another local store register. This 
register now contains the address (m+15) of the first output gate. 

Step 4 reads the gate info word for the gate at location m+15 into a 
register. Step 5 reads the contents of location n+1 (m+6) into a 
register. Finally step 6 reads the gate info word for the second output 
into a register. Thus the minimal x processing consists of steps 1-6. 
Now to estimate the timing on this, assxjming best possible case, we 
will consider the time necessary to do the 6 reads. We assume address 
formation takes no time. Based on the timing constraints for control 
store [1:36], if we set up for the control store read in Tn, (assuming 
all T-steps are non-stretched and correspond to 1 T-period) the READ 
CS cannot legally occur until Tn+2. In the best case, we can also set 
up for a new read of control store in Tn+2, which produces the timing 
sequence below; 

set up for read of x gate info word 


read x gate info word, set up for read of CLINK 
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read CLINK, set up for read of first output address 

read first output address, set up for read of first z 
gate info word 

read first z gate info word, set up for read of second 
z address 

read second z address, set up for read of second z 
info word 

read second z info word. 

Thus, the best case for x processing is 13T~periods per x. Now let us 
examine the z processing. In the non-null case, we used one nanoword 
to do the setting of bits, etc., necessary in processing a z. We now 
need to add in the time necessary to link the gate into the MLINK, 

LINK queue. A measure of this task can be gleaned from the MULTI 
instruction ENQ. ENQ is an enqueue instruction designed for creating 
linked lists. It takes 27 T-periods [2:60]. We could possibly do better 
by using an ST (store) instruction of MLINK into the new gate^s LINK 
and then an MVR (move register) of the new gate^s address into MLINK, 

This approach requires 7T for the store and 5T for the MVll [2:54-55] 
for a total of 12T. We might further assume that custom nanocode could 
speed this up by 1/3 for a time expenditure of 8T. (8T is very close 

to the time necessary for this operation in the actual implementation.) 
Based on these new numbers, we can calculate a new lower bound on 



slow-down factor; 


X processing time = # x gates processed X 13T/x gate 
= 100 gates X 13T/gate 
= 1300T; 

null z processing time = # null z gates X 2T/gate 

= 100 gates X 2T/gate 
= 200T ; 

non-null z processing time = # non-null z gates X (5.4T/gate 

+ 8T/gate> 

= 100 gates X 13.4 T/gate 
= 1340T; 

total processing time = 1300T + 200T + 1340T 

= 2840T. 


This translates to: 

time = 2840T x 80ns/T 

= 227200ns = 227, 2^ 
slow-down factor = 227 , 2»s 

= 2272:1 slow down (for a •1/s machine cycle) 

This figure is much more realistic than the 600: 1 figure obtained 
before, but it is important to note that our inclusion of T-periods for 
processing in this analysis does not begin to approach what is necessary 


in the actual algorithm. 



We propose to iterate through the calculations one final time, 
developing motivations for additions to the time estimates we have 
presented and ultimately defining a realistic best case estimate 
of the time required for performance of the algorithm. 

To begin the analysis for this last iteration, we will modify 
somewhat the allocation of timing between the x and z processing. By 
this, we mean that the stepping down the output list and reading of the 
output gate info word is not really a function of x processing but 
belongs more properly in z processing. We will shift it into z pro- 
cessing for one primary reason. The x processing pipelined read of 
the outputs in the last analysis is not practical and really cannot 
be done in that fashion. The practical implementation is: read of 

one output; process the output; then loop back and read the next 
output. So, in the first step, we have taken 10 T-periods out of the 
X processing. (Time for read of CLINK and each of the output addresses 
and info words.) At this point x processing consists of reading 
only the x gate info word and requires 3 T-periods per gate. 

As you will remember, the 3T estimate assumed that address 
formation took no time. In actual fact, if we assume that the address 
is in a local store register, address formation only takes the time 
necessary to set up the busses to use that as the control-store address. 
This adds 1 T-period. Thus to read the gate info word for an x requires 
4 T-periods. 
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The next thing we need to do for x processing is to use the QM/1 
micro-instruction execution capability to do a multi-way branch based 
on the data in the info word. Since we want to branch on more than 
the 7 bits available in the QM/1 local store register 31 C~field, we 
need some extra processing to set up the proper address. This pro- 
cessing, plus the multi-way branch itself, requires 5 additional T- 
periods. (The minimum nanocode segments are given in Appendix C;) 

Thus the basic x processing set up takes 9 T— periods. Based on our 
implementation, the actual x processing takes from 2 T-periods (for 
a gate not properly queued; i.e., no action necessary) to 9 T-periods 
for a gate requiring more complex processing. The time for the most 
standard processing (gate normally queued) is 5 TVperiods. Thus for each 
x: set up, multi-way branch and x processing takes 9T + 5T = lAT. 

The only remaining step is to set up for processing of z s for each 
X and the set up (address formation) for processing the next x in the 
queue . 

The set up for z processing consists of calculating the address 
of the CLINK word for this gate (gate info word address +2) then 
reading in CLINK to get the address of the output list. This processing 
takes 6T. The end of x processing for the current gate consists of 
setting up for the next x. This involves calculating the address of 
LINK and then reading the value of LINK. This takes 6 T-periods. 

Thus, the total x processing is given by; 


\otal 

^total 


V +X. +x +x 

set up oranch proc z set up next 

4T + 5T + 5T + 6T + 6T 

26T 


( 6 ) 
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Z processing consists of setting up for the processing of the current 
output, doing the actual processing, and doing the preliminary set up 
for processing the next output. The first part consists of reading the 
address of the gate info word for this output and then reading the gate 
info word itself. This is then followed by the multi-way branch 
(similar to the x processing multi-way branch) . This operation takes 12T 
(see Appendix C) . 

Actual z processing takes 2T for the null case, and from 5T to 1ST for 

the non-null case. To this 5-18T we need to add the time necessary to add 

this z to the queue. This time is 6T. Thus, for non-null z processing, 
using the most common case of 7T for processing plus 6T for the queue 
addition, we need 13T. So, for z processing itself we have: 

null z processing = 2T; 
non-null z processing = 7T + 6T = 13T. 

The final set up for next z is essentially included in the set up 

for this z. The only thing that is not done is the testing if this 

is the last output. We will assume the sign bit in the last output 
is set to 1. The time required to do this test is 4T if it is the last 
gate and 8T if it is not. (We will use 6T for our figures based on 
an average fan-out of 2.) Thus, the total z time is: 


^total 


z z z 

set up process next 

12T + 2T + 6T (null z) 


= 20T (null z) 

= 12T + 13T + 6T (non-null z) 

= 31T (non-null z). 


(7) 

( 8 ) 
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We now have all the figures necessary to calculate best case slow-down. 
As an aside, please note that the nanocode given in Appendix C will 
not work if put together. The most striking example of the reason for 
this is the processing to determine if this z was the last in the output 
list. Remember we assumed the sign bit was set. This means that when 
we read the gate info word, we would have to clear all sign bits before 
the read. This is not accounted for in the nanocode of this example. 

There is also no provision for testing the last x in the queue. But 
as a best case timing estimate, these figures define the range of 
numbers involved. So, the calculation of slow-down factor looks like: 

X processing time “ # x gates processed X 26T 

= 100 gates X 26T 
» 2600T; 

null z processing time = # null z gates X 20T 

= 100 gates X 20T 
= 2000T; 

non-null z processing time = # non-null z gates X 3 IT 

= 100 gates X 31T 
= 3100. 

This translates to: 

time = (2600T + 2000T + 3100T) X 80ns/T 
= 7700T X 80ns/T 
= 616000ns = 61^s. 

Thus for a . Itos cycle machine, the slow-down factor is 6160:1. 


In summary, it is obvious that there are several parameters 
which determine the slow-down factor for a given case. The parameters 
are: 


1. system size in total gates; 

2. number of gates in system which change value during a 
cycle (average) . This may be expressed as a percentage 
of system size ; 

3. average fan-out per gate; 

4. cycle time of the emulated system* 

For our analysis here we assumed that: 

1. system size » 2000 gates; 

2. percentage of gates changing - 100 gates = 5%; 

3. average fan-out = 2; 

4. cycle time of the emulated system = 0-^s . 

In the following sections, we present some details of the algorithm 
we implemented and the results of our timing studies. Since those 
timing studies address a ^s cycle time machine (10 times slower than 
the machine we assumed here) we can recalculate the slow-down for our 
idealized implementation. It then becomes 616:1. Remember that this 
does not take into account all of the necessary actions. It is thus 
reasonable to expect that the slow down factor for a machine to 

be best case 600-800:1 and for a .^s machine to be 6000-8000:1. In 
summary, for the slower machine, with a smaller system (2000 gates), the 
LRC estimate of 500-600:1 is quite optimistic but still a reasonable figure. 


IV. ADDITIONAL FEATURES OF THE ALGORITHM 

The algorithm introduced in the previous section includes additional 
elements which allow it to handle the types of situations expected in 
real-world applications. Aside from handling all types of gates in 
the same manner, and being able to quickly process gates without having 
to review every input of every gate, this approach takes into account 
the possibility for double queueing. If two gates share the same 
output gate and both change value such that the output gate should 
change, they will both independently queue the same gate for processing. 
If this happens within a single logic level, it is quite possible that 
the common output gate should in fact not be queued at all, since as a 
result of both inputs its value should remain the same. To handle 
this sort of case, the NASA LRC algorithm includes processing which pre- 
vents unnecessary queueing, as well as a second set of flags (V2 and A2) 
and a second queue linkage word (LINK2) which are used as a means of 
remembering the necessary data for processing an additional queueing 
if in fact one is required. This latter situation arises when double 
queueing is spread over two propagation cycles (two logic levels). 

An additional feature of this approach is that flip-flop devices 
are treated as ordinary gates with some additional special case con- 
siderations. The flag FF is used to indicate a flip-flop device and 
enables the algorithm to handle such a device effectively. Involved 
in this process is the flag T, which indicates a trigger input for a 
flip-flop, requiring a slight variation in treatment. 

A list of the variables involved in this algorithm and their usage 
is provided in Table IV- 1 , and a section of the algorithm dealing with 
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double-queueing is detailed in Figure IV-1. It is entered only if during 
the normal processing of an output z-gate, that gate’s CNT value transi- 
tioned into or out of zero. (Thus indicating that this z gate should be 
queued for future x gate processing.) The variables of most concern here 
are: 1) the "properly queues" flag which indicates that a gate is 

queued for x processing, and when needed for double queueing enables 
a gate to be "dequeued" without actually dequeueing the gate itself; 

2) the "cycle queued" flag A^, which remembers in which propagation 
cycle (C) this gate was queued for x processing (propagation cycles indi- 
cated by C are equivalent to x queue processing cycles, and each repre- 
sents the processing of one. logic level of the system) ; 3) the current 
value of the gate V^; and 4) the linkage variables LINK^ and MLINK 
used in the linked-llst x queue whose ties between gates define the 
course of any given processing cycle. 


The variables concerned with the queueing of a gate onto the 
second x queue include V’ 2 , A^, A^, and LINK 2 . Although we imple- 

mented the algorithm as given to us, we feel there are some functional 
discrepancies involved in the manner in which these variables are used. 
The concept presented here, however, is of more importance than the 
details of its design in the algorithm. The need being addressed 
here concerns the queueing of gates a second time. (This situation 
arises when double queueing occurs over two consecutive propagation 
cycles, as discussed briefly above.) The idea is to remember what 
the value of the gate is at the time of the second queueing, in 
order to correctly process the gate when its first processing cycle 
begins; and to queue the gate properly for a second processing. The 
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area of the flow in Figure IV- 1 encircled with dashed lines attempts 
to accomplish these goals. The flag indicates that the second queue 
is employed for this gate; is later to become the A^ of the second 
processing cycle, and indicates that proper queueing has occurred; and 
V 2 remembers the current newly changed value of the gate. (The second 
queue linkage involving LINK^ and MLINK in the given algorithm does 
not work properly when integrated with the normal linkage system using 

LINKj.) 

These additional features do cause extra overhead in the 
execution of the algorithm, but they enable the algorithm to emulate 
a wider range of real-world systems and to accommodate all the currently 
foreseeable events which occur in gate level emulation. 
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Variable 

Ai 

A2 

A3 

A4 

Vl 

V2 

rr 

rr 

T 

CLINK 

LINKj 

LINK^ 

MLINK 

C 


Definition /Usage 

"Properly Queued" Flag; Indicates Gate Queued For 
X Processing. 

For Second Queue. 

"Cycle Queued" Flag: Indicates Value Of C When Gate 
Queued For X Processing. 

Flag Indicating Gate Queued Onto Second Queue. 

Current Output Value Of This Gate. 

Gate Value For Processing Of Second Queue. 

Flag Indicating A Flip-Flop Device. 

Flag Indicating A Flip-Flop Trigger. 

Pointer Word Containing The Address Of The Output 
List For Each X Gate. 

Linked-List Linkage Word Pointing To The Next Gate In 
The Queue (Zero If Last Word In The Queue). 

LINKj For Second Queue. 

Pointer To First Gate Of The Next Queue (Each X-Queue 
Cycle Starts A New Queue), And Zero If End Of Machine 
Cycle. 

Propagation Cycle (X -Queue (^cle) Indicator (Alternates 
Value For Each Logic Level Procesed). 


TABLE IV-1: Definition Of Variables 
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Figure IV-1 Flow Of Double Queueing Logic 
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V. IMPLEMENTATION OF THE ALGORITHM 

In implementing the NASA LRC algorithm previously described, we 
have used the unique micro-instruction decoding capabilities of the QM-1 
as a means of efficiently handling all of the individual flag conditions 
which arise in the course of normal processing* It is important to 
recognize in this algorithm the inherent dependence upon individual 
flag-bits and the large amount of processing necessary to handle them 
properly. Conventional coding methodology requires these flags to be 
tested and manipulated individually (which can be quite burdensome). 

A great deal of speed and flexibility can be gained by combining all 
of these flag-bits into one n-bit computer word, and subsequently using 
this word as the address of a specific routine in memory written to 
handle the exact bit pattern found in that arrangement of flag-bits. 

Thus we associate one word of data with each gate in the system, and 
we arrange that word so that each bit is dedicated for use as a specific 
flag. Then when the value of a flag-bit is needed to be known in order 
that some, action may be taken, rather than reading each bit and testing 
for one or zero, the entire set of flag-bits is taken together as a 
’’condition set” and used as the absolute address in nanostore of the 
routine which performs the exact actions necessary under the conditions 
specified by the flag-bits. In addition, when processing of that gate 
changes one of these flags, the appropriate bit of that gate’s ”info 
word” is changed to reflect the latest condition of the flag. This is 
very fast and very effective, but it does require a great deal of memory 
(in this case, nanostore). For our use, however, this drawback is far 
outweighed by the execution speed and flexibility gained. 


Figure V-1 gives an overview of the logic used to implement the NASA 
LRC algorithm. The boxes containing an asterisk (*) or asterisks (**) include 



Figure V-l Overview Of Implementation Logic 
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the type of processing described above. Before detailing these however, 
it is first appropriate to examine the lay-out of the memory tables 
used and the reasoning behind their structure. 

Figure V-2 shows the structure of the "blocks" of gate data which 
reside in main-store and are loaded one at a time into control-store 
for processing. (Note that for an all control-store resident system 
no loading of blocks nor pre-x-cycle z-gate processing is needed.) 

In order to handle the most general case of system size and design, 
we designed our emulation to handle systems larger than 6000 gates 
(which is the largest system under this design to be totally control- 
store resident) . This is accomplished through the use of the block 
structure and the means to keep track of interblock connections as 
follows. For each gate there are four 18-bit words which are reserved 
for dedicated use. The first word is the "gate info word" containing 
the flag-bits (detailed in Figure V-2) and the CNT. counter for this 
gate. It is the lower three bits of this word, concatenated with the 
upper seven bits of the same word, which comprise the 10-bit nano store 
address used for branching to the various processing routines as des- 
cribed earlier. It is of importance to point out here that the three 
bits FF, T, and the "z or x" flag, are used as a "nanostore page address" 
and therefore must be placed into F-register FIDX prior to branching 
to any routine. This enables the use of the micro-instruction decoding 
facility of the QM-1, which concatenates the 3-bit page address found 
in FIDX with the 7-bit address found in the C-field of local store 
register 31, to form a 10-bit absolute nanostore address. Thus the 
decoding of all flag-bits for each gate can be done "instantly" by 
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In The Structure 



Figure V-2 BLOCK STRUCTURE IN MEMORY 
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placing the gate info word into R31, and the B-field of R31 into FIDX, 
and then invoking the micro-instruction decoding facility. This causes 
a branch to a dedicated routine which sets/resets the flag-bits as 
necessary for its exact input conditions, and then returns to a common 
continuation location in the main processing routine. 

The second and third words of the gate block structure are labeled 
LINKj and LINK 2 . These are used as linkage address words in the linked- 
list x-queue structure which controls the flow of x processing. (LINK 2 
is used only for secondary queueing as discussed in section IV) . They 
contain the block number and gate number of the next gate in the queue. 

The fourth word is labeled CLINK and is the relative address of 
the start of the list of output gates (z-gates) associated with this 
X— gate. The number of output gates for each x-gate will vary from 
gate to gate of course, but for sizing considerations in this study 
we have averaged the fan-out factor at 2.0 output gates/x-gate. Thus 
we need 6 words per gate in each block (the four words described 
above and one word for each output gate). This gives an average size 
of 36000 words /block for a block size of 6000 gates. 

Because the blocks will generally vary in size, there is an additional 
table in main-store dedicated to locating blocks in memory. It is 
indexed by block number and contains the absolute main-store address 
of the start of each block and each block's size (as the number of 
18-bit words in the block) . This table is also shown pictorially in 
Figure V-2, and is particularly useful in aiding the process of block 
loading (from main-store into control-store and back again). 
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There is one more table of interest in this design. This is a 
control-store resident "free core pool" table which is dedicated for 
use as a linked-list queue. VJhen, during normal x-gate processing, an 
output gate (z-gate) comes up for processing which does not reside in 
the same block as the x-gate, then it becomes necessary to queue such 
z-gates for future processing (when the appropriate block is loaded 
into control-store). This free core pool queue is used to queue these 
z-gates for "pre-x-cycle processing", and is termed the "z queue". 

It is depicted in Figure V-3. 

Now, returning to Figure V-1, the first action is to process 

(pre-x-cycle process) the z-gates waiting in the z queue. So if the 

queue is non-empty (and since the gates are ordered sequentially by 

block number/gate number) , the next z-gate in the queue dictates what 

block should be loaded from main-store. When the loading is completed, 

those z-gates in the queue which reside in this block are processed 

as normal z-gates in the following manner: The CNT counter for each 

z-gate is updated according to the value of the x-gate for which the 

z-gate is an output. (If = 1, CNT^ is incremented, and if 

V =0. CNT is decremented. For z-gates whose x-gate is in some 
lx z 

other block, is passed as the sign-bit of the queue element 

itself, as shown in Figure V-2, similar to the "end-of-CLINK" flag-bit K.) 
^ If this CNT transitions into or out of the value zero, then the 
z-gate becomes "non-null" and a branch is taken using the micro- 
instruction decoding facility to the appropriate z-processing routine. 
Upon return from this routine, the z-gate has been queued into the 
x-queue (using LINKj^, etc.) for future x-processing. Those z-gates 
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Figure V-3 Output Gate (Z) Queuing 
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whose CNT does not transition into or out of zero need no other processing, 
so they (and the returned non-null z-gates) are simply restored into 
the control-store block for future reference. This is the procedure for 
normal z— processing and occurs in Figure V— 1 in the boxes marked with an (*) 

Following the pre—x— cycle z— processing, we begin normal x-processing . 
This includes the two loops and the same basic logic shown in Figure 
III-l, with the actual x-gate processing (shown as the box marked with (**) 
in Figure V-1) accomplished via the micro-instruction decoding facility. 
Note an additional difference exists here, in that z-gates to be 
processed for a given x-gate must be checked first to see if they 
reside in the current block in control-store. If they do not, they 
are added into the z queue for future processing. And if they are in 
this block they are processed as described above for normal z-gate 
processing. 

The implementation of this algorithm has been coded in nanocode, 
and the timing studies discussed in section VI and Appendix B for non- 
control— store resident systems are entirely based upon this implementation. 
For systems which are completely control-store resident (system size ^ 

6000 gates), the timing studies presented are based upon the normal x 
anH z processing loops of this implementation, skipping the sections 
of code dealing with z queueing. This is reasonably accurate as an 
estimate of resident system timing considerations. However, the linkage 
words and CLINK "addresses" in memory are still in "block number/gate 
number" form, and hence incur additional unneeded overhead for absolute 
address computation, etc. If this code were optimized to be a strictly 
control-store resident emulation (instead of the generalized "handle all 
cases" emulation now coded), the efficiency of processing could be signifi' 
cantly Increased, and perhaps a 20%-30% timing improvement realized. 
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VI. TIMING RESULTS 

In order to gain an understanding as to the applicability of the 
algorithm described in the previous sections, we derived two equations 
which enable the generalized projection of the timing factors involved 
for various kinds of systems. (The derivations of these relations and 
examples of their use are presented in Appendix B.) As an initial, 

"most simple" case, we examined systems which reside totally in control- 
store (and therefore need no data storage externals to control-store nor 
the associated loading and linkage software.) The amount of time 
necessary to emulate a single machine cycle is given in T-periods by 
the relation: 


T 


resident 


(68 + 3bF)x + 29y 


where F is the fan-out factor defined to be the number of output gates 
per gate processed (or the number of z gates per x gate) ; x is the number 
of gates changing value in the system; and y is the number of queue 
processing cycles needed to emulate the data propagation associated with 
a complete machine cycle. This can be thought of as the number of logic 
levels in the system. 


Figure VI-1 displays this equation plotted for F = 2.0 outputs/ 
gate, for the value of x ranging through 5%, 10%, 15%, and 20% of the 
system size. The datum of most interest here is that a single cycle for 
6000 gates is emulated at a slow-down factor of 3,492:1 for a real 
machine cycle of ^s, with 5% of the system changing. Notice the effect 
of changing F from 2.0 to 3.0 outputs/gate in Figure VI-2. A single 
cycle for 6000 gates at 5% changing now results in a slow-down factor 
of 4,322:1 for a l^s machine cycle. This change from a fan-out factor 


System Size 



Figure VI-1 Emulation Cycle Time For Control-Store 
Resident System: T=(68 + 35F) X + 29 Y 





25.0 



Figure Vh2 Emulation Cycle Time For Control -Store 
Resident System: T=(68 + 35F) X + 29 Y 




of 2.0 to 3.0 outputs /gate generally results in a 20% increase in slow 
down factor. The data points for these curves are given in Table VI-1. 

We can gain a more comprehensive understanding of how these curves 
relate to one another by translating them into "sample time" (i.e., the amount 
of time needed to emulate .1 second of real time execution on a l^s machine). 
Figure VI-3 shows the curves for F = 2.0 and 3.0 both for 5% of the 
system changing and for 20% of the system changing. Notice that a 
single sample for a 6000 gate system at 5% changing and F = 2.0, 
requires 5.82 minutes, where the same sample at 20% changing requires 
22.38 minutes. (Further data points are given in Table VI-2.) We 
can also see that changing F from 2.0 to 3.0 increases the sample 
time by the same 20% seen above. 

It is perhaps most useful to view this data from an experimental 
point of view, and to see how long it would take to run for example 
10,000 samples of .1 second real time each. Figure VI-4 displays this 
information. For a 6000 gate system with a machine cycle time of ^s, 
with 5% of the system changing, it takes approximately 40 days or about 
1 1/3 months to run a 10,000 sample test with F = 2.0. The data points 
for these curves are given in Table VI-3. 

In addition to the above described "most simple" case, we expanded 
our study to include larger systems whose size necessitates system 
residence in main-store with "blocks" of gates being loaded into control- 
store for processing. The equation for these systems in generalized 


form is as follows: 



1/10 Second Sample With l^s Machine Cycle (Real Time) 



Figure VI -3 Emulation Sample Time For 

Control-Store Resident System 




6000 Gate System With l»is Machine Cycle For 5% Of System Changing 
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T *j * “ blocks in system) X {(44) X (4 + F) X 

non-resident 

(block size) + 84 X (z prequeued) + 94 + 

(z queued) X [72 + (25) X (n - 1)] + 

(68 + 35F)x + 29y> 

where z prequeued, z queued, and n are variables associated with the 
processing of output gates which reside in blocks other than that of 
their input gates (see Appendix B for exact details). In order to 
gain an appreciation for the meaning of this relation and the processing 
involved in such an emulation system, we will consider a system with 
only two blocks, each block sized at the maximum available control-store, 
6000 gates. Thus we have a 12000 gate system. Assuming 5% changing 
and F = 2.0, as above, we find that a single cycle on a ^s machine 
takes 260,579,630ns. This is a slow down factor of 260,580:1. A 
single .1 second real time sample would take 7.238 hours, so a 10000 
sample test would require 8.26 years! As you can see this is not a 
feasible approach. The reason these numbers are so high is that main- 
store accessing is extremely slow. It requires 22 T-periods/word to 
transfer from main-store into control-store, which means to load a 
single 6000 gate block (with 6 words of data required for each gate) it 
takes (22) X (6000) X (6) = 792,000 T-periods = 63.36ms. Hence this 
overhead becomes quite prohibitive. 



Table Vl-la: Emulation Cycle Time (in MS) 

For Control-Store Resident System 
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VII. CONCLUSIONS 

As we stated in the introduction, the intent of our implementation 
was to prove the feasibility of using gate level emulation technology 
in support of data collection for reliability studies of fault tolerant 
digital avionics equipment. It is clear that to support statistical 
measures, the key potential problem is the time necessary to execute 
a sample run on the gate level emulation. If this time is too long, 
the task of running a statistically significant number of samples 
becomes overwhelming. We have therefore focused our feasibility 
determination on the execution speed of gate level emulation. 

As shown in Appendix B, our QM/1 implementation results in a 
1200:1 slow-down factor for a 2000 gate, control-store resident system 
within the constraints given in that appendix. This datum is also shown 
in section VI in the graphs although it is not explicitly noted since 
the 6000 gate example given there is the limiting case. This 1200:1 
slow-down compares very favorably with the best possible case for 
slow-down shown in section III of 600:1, since the implementation 
contains features which cause additional overhead for address calculation 
and system partitioning into blocks. The maximum resident system of 
6000 gates also falls in the reasonable range of 3500:1 slow down. 

On the other hand, the partitioned system case, of which the 12000 
gate, 260,500:1 slow-down is an example, is clearly not feasible for 
any reasonable number of samples. Based on this, we conclude that the 
gate level emulations should be restricted to control-store resident 
subsystems, which for the QM/1 works out to a maximum of about 6000 gates. We do not 
feel this is overly restrictive considering that we have seen gate level 
simulations of current technology micro-processors which fall in the 
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range of 2000 gates. Thus 6000 gates can represent a fairly substantial 
subsystem. Furthermore, by restricting ourselves to completely resident 
emulations, further economies in the implemented algorithm can be 
achieved as mentioned in Section V. We estimate that we can achieve 
about a 20-30% improvement in speed. 

The primary conclusion we can make based on the implementation for 
the QM/1 is that gate level emulation is feasible to do and provides 
the speed necessary for statistical studies of reliability. 

Although the implementation we did was based on the QM/1 archi 
tecture, the restrictions imposed by that machine do impose a limit 
on what is achievable. Examples are the 6000 gate limitation and the 
additional overhead necessary to decode ten bits rather than the seven 
that the QM/1 is set up for. Three possibilities come to mind in terms 
of providing the emulation support capability for the final facility. 

The first of these is to consider making hardware modifications to the 
QM/1. This could include expansion of the maximum permissable control- 
store size or the addition of a bus to connect main-store and control- 
store directly. Secondly, other micro-programmable machines may be 
more amenable to the application. And finally, the possibility of 
building a special purpose, gate level emulation machine should be 
considered. Such a machine might be readily assembled from 2900 series 
chips. All three of these possibilities will be considered in the second 


phase of the contract. 
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APPENDIX A 

UNIFORMITY OF GATE TREATMENT 
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One of the primary benefits of the NASA LRC gate level emulation 
algorithm is the concept of gate processing independent of the function 
of the gate itself. What this means in practical terms is that the 
algorithm does not need to keep track of the gate type and can handle 
ANDs, ORs, NANDs, and inverters all in exactly the same fashion. This 
appendix is intended to provide a brief description of how this is 
possible by discussing a few examples to illustrate the processing 
done and decisions made. 

In order to process the gates, two values are required. The first 
represents the current value of the gate. We will call this V. The 
second quantity relates to the number of inputs of the gate. We will 
call this value CNT (for count) . This number is the key to the pro- 
cessing and the distinction as to type of gate is characterized by the 
initial values assigned to CNT and V. 

In operation, whenever an input to a gate changes value (from 0 to 
1 or vice versa), the quantity CNT is operated on. For the 0 to 1 change, 
CNT is incremented by 1. For the 1 to 0 change, CNT is decremented by 1, 
The gate whose CNT is being updated will change value whenever CNT 
transitions either into or out of 0, That is, if either the old value 
of CNT is zero (before increment or decrement) or the new value of CNT 
is zero (after the increment or decrement), then the value V is changed 
(0 to 1 or 1 to 0 depending on current value). A few examples will 


best illustrate this. 


Example 1 : 3 input AND gate 



V=A*B*C 


The description of the action of the 3 input AND gate is best described 
by the following state diagram. 


I 
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Note that the left to right arcs represent an input going from 0 to 1 
while the right to left arcs represent an input going from 1 to 0, 

When we get to the rightmost state, all inputs are 1 and hence the output 
V is 1. In all the other states, at least one input is 0 and the 
output V is 0. Now suppose we let CNT = 0 for the case where all 
inputs = 1. The state diagram with CNT values in place of number of 
inputs = 0 is: 


t etnent Cu t g yr ^n tfewent;c /|Q- 




^Pcrement ^ 

. w « Value V-1 

Value V* 0 J 


Note that the arcs on this diagram represent exactly the same as on 
the previous diagram; i.e., left to right is an input going from 0 to 1 
and right to left is an input going from 1 to 0. The transition of 
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CNT into and out of 0 occurs across the, dotted line and value V does 
indeed change when we cross this line. From this diagram, we see that, 
to initialize CNT and V for an n- input AND gate, we first assume all 
inputs are 0. We then set CNT = -number of inputs and V = 0. After 
initialization, we can blindly follow the specified processing and 
the proper gate output value will be produced. 


Example 2 ; 2 input NAND gate 


A 

B 


= A-B 



The NAND. gate is a simple extension to the AND, The only difference is the 
value of V. V will be 1 to the left of the dotted line and 0 to the 
right. Thus a 2 input NAND gate state diagram looks like; 



Initialization conditions are CNT = -number of inputs and V = 1, 


Example 3 : 4 input OR gate 



The state diagram for the OR gate looks like: 
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I ^ 
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I 

To isolate the CNT = 0 node in this case, we need to isolate the state 
to the left of the dotted line. Thus the CNT state diagram looks 
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Thus the initial condition (all inputs 0) for an OR gate are; CNT = ( 
V = 0. The extension to a NOR is made simply by using CNT = 0, V = 1 
for initial conditions. 


Example 4 : 2 input XQR gate 


V=A*B+A*B 


The state diagram for the XOR gate is; 
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The CNT state diagram becomes: 



This again is a simple case for the general algorithm. Initial 
values for the two input cases are CNT = -1, V * 0. This is also the 
case for the general *'odd" number of inputs type gate (i.e., 1 is produced 
for an odd number of inputs = 1). The only difference is that instead 
of signed arithmetic, modulo arithmetic is used (-1 mod 2=1). 

Based on these examples, the initial conditions (assuming all 
inputs 0) for the most common gates are given in Table A-1. The 
inverter can be handled as either a one input NAND or a one input NOR. 


Type 

Gate 

Initial Value 

CNT 

V 

AND 

-number inputs 

0 

NAND 

-number Inputs 

1 

OR. 

0 

0 

NOR 

0 

1 

INVERT 

-1 

1 

INVERT 

0 

1 

XOR 

-1 

0 

NXOR 

-1 

1 


Table A-1 
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Once the initialization has been done, the processing of each 
gate is exactly the same, regardless of type. In addition, the concept 
is flexible enough to be able to handle more non-standard type gates 
(e.g., the odd number of input counter which could be used for parity 
generation) . 



APPENDIX B 


DERIVATION OF EQUATIONS 

This appendix explains the basis for the system "emulation time" 
equations used in this report. Two basic equations are derived herein, 
the first for systems residing totally in control-store (e.g., system 
size s 6000 gates) ; and the second an approximation for larger systems 
which necessarily have only part of the system in control-store and the 
remainder in main-store. This appendix first explains the former of 
these and then gives some examples. Following this is the derivation 
of the second equation and then an example of its usage. 

It should be understood that these equations are derived from the 
actual implementation of the algorithm described in section V. 

The given timing considerations are simply the sum of the individual 
T— periods involved in executing the algorithm. (Thus when it is stated 
that x-processing takes 46 T-periods, this comes from examining the 
code itself. Recall that one T-period is the basic unit of time for 
nanocode, and is defined to be 80ns). 
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I. DERIVATION OF THE EQUATION FOR CONTROL-STORE RESIDENT SYSTEMS 

Given; 1) x-processing requires 46 T-periods/x and 29 T-periods/ 
logic level in the System (using 5 T-periods for 
most common case processing routine); 

2) z-processing requires: 

62 T-periods /non-null-z 
and 35 T-periods/null-z 

Where non-null-z 's are those output gates whose counter (CNT) 
transitions into or out of zero as it is processed during normal z— 
processing. This results in that z gate being queued as an x-gate for 
processing in a future cycle. Null— z's are z— gates whose counter 
does not transition into or out of zero and hence cause no further 
action to be taken. In addition the last z processed for each x 
takes 5 T-periods less than the other z's, hence: -5 T-periods/x. 

Adding these together we get 

T = (46 - 5)x + 62 X (# non-null z's) + 35 X (/; null-z's) + 29y 

resident 

with y = # logic levels in the system. (1) 

Now consider that 

1) the fan-out factor F = ^o^al # ^ 


\ 


and 2) 


that the # non-null z's is identically equal to the 
number of x's since each x comes from a non-null z. 
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Hence: F = z/x 

(# null z*s) + X = total # z's 
so (# null z's) + X = xF 

/. # null z's = xF - X = x(F-l) . (2) 

Substituting equation (2) into equation (1) we get; 


•J 


I 



T = (46 - 5)x + 62x + 35[x(F - 1)] + 29y 
= (41x + 62x + 35xF - 35x + 29y 
= 68x + 35xF + 29y 


T 


resident 


(68 + 35F)x + 29y . 


(3) 



Equation (3) is the general form of the control-store resident 
system equation. Now for the following examples let us assume that 5% 
of the system changes at any given time. Thus 

X = (.05) X (system size) (3.1) 


Additionally, in order to estimate the value y, the number of logic 
levels (or x— queue cycles) , we need to make an assumption concerning the 
system itself. To facilitate this assumption, we will deal with an 
intuitive concept called system shape. We assume the system is in 
general rectangular when the logic levels are plotted across the top 
of. the diagram and the gates per level down the side. For example, 
the RS flip-flop below is square (2 logic levels w/2 gates/level) 




Now, we recognize that, for larger systems, the general shape 
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will be a non-square rectangle with the long 

P 

direction; i.e., 

Q 


side in the vertical 


Where P ^ Q 


However, the algorithm will translate feedback (as in the flip- 
flop shown before) as additional logic levels. Because of this, the 
general shape of the system, as seen by the algorithm, will become more 
square. Therefore we will assume for the following examples that the 
system is square so that the number of logic levels (y) = number of 
gates per logic level = /system size . (3.2) 

Combining (3), (3.1), and (3.2) with a fan-out factor of 

F = // z's/x = 2.0 we get: 

T = (68 + 35(2.0)) (.05) (System Size) + 29/system size 

resxdent 

Hence : 

T ,, = (6.9) (System Size) + 29/system size (3.3) 

resident 

Example I ; System Size = 2000 gates: 

T . = (6.9) (2000) + 29(/2000 ) 

resident 

= 13800 + 29(44.725) 

= 15097 T-periods = 1207760ns 

Or a slow down factor = 1207.8:1 


for a l^s machine cycle. (12,077.6:1 for .1^ machine) 
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Example II : System Size = 6000 gates: 

T = (6.9) (6000) + 29(/6000“) 

resident 

= 41400 + 29(77.46) 

= 43646 T-periods = 3491680ns 
Or a slow down factor = 3491.7:1 

for a ^s machine cycle. (34,916.8:1 for . machine) 


B-6 


II. DERIVATION OF THE EQUATION FOR NON-CONTROL— STORE— RESIDENT SYSTEMS 

In order to accommodate large systems whose size prohibits 
residence of the entire system in control-store, in this emulation 
"blocks" of gates (large tables of gate data) are loaded into 
control-store one at a time for processing, while the remainder of 
the system being emulated resides in main-store. When a block is 
loaded, the first processing necessary is that needed for z gates 
residing in this block which were queued by previous x gates in other 
blocks. This uses an additional queue, dedicated to this situation, 
and so we have termed this initial z processing "prequeue" processing. 
So we define "z prequeued" as the number of z gates processed in this 
prequeue phase. Similarly, "z queued" is the number of z gates queued 
during the normal processing of each block onto this dedicated z gate 
queue . 

Now, given that: 

1) it takes 22 T-periods/word to transfer 18-bit words 
from main-store into control-store (and visa versa) , 
with an overhead of 28T-periods per block; 

2) pre-queue processing of z's takes: 

84 X (z prequeued) +66; 

3) there are A words /gate in the memory tables plus one 
word for each output gate (z) . Assume a fan-out factor 
= # z's/x = F. Then 4 + F words are needed in memory 


per gate. 
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So to begin with, equation (4) below accounts for the loading of the 
new block from main-store into control-store, and the processing of 
z's which were queued by some previous block. All of this occurs 
prior to the normal x and z processing: 

T = (44) X (4 + F) X (block size) + 84 X (z prequeued) 
pre 

+ 94. (^) 

Equation (4) will be in effect for each block as it is loaded, and includes 

the restoration of each block to main-store. 

In addition we must consider the processing of gates in the block 

while it is in control-store using equation (3). 

Furthermore, subroutine ZQUEADD is used to queue z gates who 
reside in blocks other than the current block in core (onto the dedicated 
z queue) . In this process each element in the queue is compared with 
the z being placed onto the queue to ensure sequential ordering of 
the queue (by block number and gate number) . This searching takes 
25 T-periods for each queue element searched which does not yield the 
position for the new z* Hence if there are m elements in the queue 
and n of these are searched for each z being added (including the 
element which reveals the location for the new z), we must add 
25 X (n-1) T-periods for each z placed onto the queue. Additional time 
is needed as well, but the searching accrues most of the queueing time. 

Thus for each block: 

T 

zqueadd 


(Z queued) X [72 + (25) X (n-1)] 


(5) 
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Hence if we combine equations (3), (4), and (5) we arrive at what 
seems to be a reasonable approximation equation for systems of more 
than 6000 gates: 


T 

resident 

T 

pre 

T 

zqueadd 


(68 + 35F)x + 29y (3) 

(4§) X (4 + F) X (block size) + 84 X (z prequeued) 

+ 94 

(z queued) X [72 + (25) X (n - 1)] (5) 


T = (T + T + T ,,) X (# blocks loaded) 

overall resident pre zqueadd 

(Where T . , is interpreted such that x and y are associated 
resident 

with block size instead of system size) 


Substituting we get: 

T * (# blocks in system) X { (44) X (4 + F) X (block size) 

+ 84 X (z prequeued) + 94+ (z queued) X [72 + (25) X (n - 1)] 
+ (68 + 35F)x + 29y} (6) 

Equation (6) is the general form of our equation. Now for the 
following example, we will make some assumptions concerning the system 
under consideration. First we need some way to approximate, as closely 
as possible, the number of output gates (z's) which are prequeued for 
a block by other blocks, and as well, the number which are queued by 
each block. If we assume a square system (as discussed previously for 
the resident system configuration) then the number of outputs for a 
block = /block size . Additionally if 5% of those outputs are changing, 
then we can assume 5% of /block size as a reasonable estimate for the 
number of z's to be prequeued for and/or queued by a given block. 
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Thus: z prequeued = z queued = .05 /block size . (7) 

Similarly we can say that the number of logic levels in the system 
= y = /block size (8) 

(based upon a square system configuration). Now we must gain an 
understanding of n, the number of elements in the queue which are 
searched in order to add each z queued into the queue. We know from 
(7) above that for each block, 5% of /block size is the number of gates 
queued by that block. Thus (# blocks in system) X (.05 /block size ) 
gives the maximum length of the z-processing queue at any time. Now 
we further assume that on the average, 50% of the queue needs to be 
searched for any given z. 

Hence: n = blocks in the system) X (.05/block size ) (9) 

Combining equations (6), (7), (8), and (9) we get: 

T = (# blocks in System) X [ ( 44 ) X (4 + F) X (block size) 

+ (84) X (.05/block size ) + 94 + (.05/block size ) 

X {72 + (25) X [(Jg) X (# blocks in System) 

X (.05/block size ) - 1] } + (68 + 35F)x + 29/block size ] . 

Combining terms gives: 

T = (# blocks in System) X {94 + (44) X (4 + F) X (block size) 
+ (.05/block size ) X [156 + (12.50) X (# blocks in system) 

X (.05/block size ) - 25] + (68 + 35F)x + 29/block size } 

= (# blocks in System) X [94 + (44) X (4 + F) X (block size) 
+ 7.8/block size + (0.03125) X (# blocks in system) 

X (block size) - 1.25 /block size + (68 + 35F)x + 


29/block size ] , 
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So T = (# blocks in system) X { 94 + (block size) X [(44) X (4 + F) 

+ (0.03125) X (# blocks in System)] + 35.55/block size 
+ (68 + 35F)x}. (10) 

Equation (10) is valid for all x and F in all systems meeting our 
initial assumptions. But in general we wish to use this as an aid 
in determining if this type of system is feasible. Hence let us 
further .assume that the number' of gates changing in the system at 
any given time is 5%. Thus x = 5% of the block size. Furthermore, 
assume a fan-out factor of two output gates per gate so F = # z’s/x 
= 2.0. Substituting these into equation (10) we get: 

T = (# blocks in System) X {94 + (block size) X [(44) X (4+2.0) 

+ (0.03125) X (# blocks in system) + (.05) X (68 + 35 X 2.0)] 

+ 35.55/block size }. 

In simplifying this equation, for the purpose of understanding the 
nature of this relation and its applicability to real systems, we assume 
the most simple case in which block processing occurs sequentially 
without interblock feedback. This means that the total system size 
= (# blocks in the system) X (block size) , and hence we can substitute 
system size into the equation for the term which contains this product. 
Realizing that this is not the general case, it is understood that 
for more complex systems, the time T will be greater than that which 
is given in this equation. 

Thus by combining terms and substituting system size appropriately 


we get: 
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T = (// blocks in System) X [94 + (0.03125) X (system size) 

+ (270.9) X (block size) + (35.55/block size )]. (11) 

Equation (11) gives T (in T-periods/cycle) for simple case systems 
with F = 2.0 and 5% of the system changing. An example of its use 
follows. 

Example of Non-Resident System : 

System size = 12000 gates. 

Block ^ize = 6000 gates. 

(hence # blocks = 2) 

T = 2[94 + (0.03125) (12000) + <?70.9 ) (6000) + (35.55) (77.46)] 
= 2©4 + 375 + 1,625400+2753.7) 

= 2(1,628,622.7) = 3,257,245.4 T-periods 

= 260,579,630ns 

Or a slow down factor of 260,580:1 for a cycle machine. 
^,605,796:1 for a . ^s machine) 

Compare this to the 3491.7:1 slow down for a resident system of 
6000 gates. (34,917:1 for . l^s system) 
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NANOCODE FOR BEST CASE TIMING ESTDIATE 


This appendix contains the "minimum" nanocode to implement the 
processing necessary for the algorithm as defined in section III. The 
structure shown in Figure III-3 is assumed and the following Local 
Store (LS) register assignments are also assumed: 


LS register 

X 

y 

z 

w 

a 


contents 

address of current x gate info word 

address of address of current z gate info word 

address of current z gate info word 

MLINK 

constant integer 2 


b 


scratch 
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A 



A 
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The x-gate processing (read of info word) and multi-way branch assumes: 


gate info word address in LS register x. 

XPROC: BRANCH (N, + 1) 

KA = X 

KB = 31. 

X. . , KA -> FCIA, KB -> FCOD 

4T .X.. 

..S. READ CS(CIA), GATE CS, 
READ NS, GATE NS 

• • • • • 

X... B-i-FIDX 
.X.. INCF-»-FIDX 
5T ..X. LOAD NPC(CS) 

...S READ NS, GATE NS 


Set up to read info word into 
R31. 

CS bus wait. 

R31-«- gate info word. 

Set up top 3 bits of NS address. 

A 1 in bit 0 signifies x processing. 

Set up NFC for branch based on 
FIDX and top 7 bits of gate info 
word. 

Branch through micro-op-code. 


9T 
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1 



Read of CLINK for this x assumes: 

- gate info word address is in LS register x 

- constant 2 is in LS register a 

- scratch register is LS register b 

- gate info word address for next output (z gate) is in LS register z 


: BRANCH (ZPROC) 

KALC = ADD 
KA = X 
•KB = a 
KX = b 
KT = y 

X. . . KB ^ FAIR, KA -> FAIL, 
KX FAOD 

.S.. GATE ALU, KT FOOD, 

KX -»■ FCIA. 

6T ..X. 

...X READ CS(CIA), GATE CS, 
READ NS, GATE NS 


Set up to get b = x4a (z = x+2) 

Register b x+2 (addr CLINK into 
b ) . Set up to read CLINK value 
into y. 

CS bus wait. 

y CLINK (address output gate 
list) . 


6T 


This then proceeds to Z processing (ZPROC) . 
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Set up for next x-gate assumes; 

- current gate info word address is in LS register x 

- scratch register is LS register b 

: BRANCH (XPROC) 

KALC = INCR LEFT 
KA = X 
KB = b 

X KA ^ FAIL, KB -v FAOD, Set up to increment address to 

SET CIH get addr of LINK. 

6T .S.. GATE ALU, KB -»■ FCIA, Scratch register b x+1. Now 

KA ->■ FCOD set up to read that LINK. 

..X. CS bus wait. 

...S READ CS(CIA), GATE CS, x address next gate info word 

(LINK of current word) . 

READ NS, GATE NS Then go to XPROC. 


6T 



' V 



gate processing set up and branch assumes: 



address of the address of this z 
register y 

gate info word is in LS 

- 

address 

of this z gate info word 

in LS register z 

ZPROC: 

• • • • 

BRANCH (N. + 1) 
KALC = INCR LEFT 
KA = y 
KB = z 
KX = 31. 



X... 

KB ^ FOOD, KA -»■ FCIA, 
KA ^ FAIL, SET CIH 

Set up to read gate info word 
for this gate; and to incre- 
ment the address to point to 




next gate address. 

QT» 

-/ X 

.X.. 

^ FA.OD 

Want to write the new address back. 


. .s. 

READ CS(CIA), GATE CS, 
GATE ALU, READ NS 

Register z ^ addr gate info 
word, y y^l (Next gate addr). 


...X 

KB FCIA, KX FCOD, 
GATE NS 

Set up to read gate info word 
into R31. 

5 

• • • • 
X... 

BRANCH (N.+l) 

CS bus wait. 

4T 

.s. . 

READ CS(CIA), GATE CS, 
READ NS 

R31 gate info word for this z. 

• 

..X. 

B -»• FIDX, GATE NS 

Set up top 3 bits of NS address. 

• 

3T 

• # • • 

X... 

LOAD NPC(CS) 

Set up multi way z branch. 


.s. . 

READ NS, GATE NS 

And go . 


C-6 


Addition 


6T 


of z-gate to link queue assumes: 

- This z gate info word address is in LS register z 

- Scratch register LS is register b 

- MLINK is in LS register w 


BRANCH (N.+l) 

KALC = INCR LEFT 
KA = z 
KB = b 
KX = w 

KS = PASS LEFT 

X... KA -*■ FAIL, KB -► FAOD, 
SET CIH 

,S.. GATE ALU, KB -► FCIA, 

KA -»■ FCID, 

KA -> FAIL 

. .X. G(G KS); G -»■ KALC 

KX -»• FAOD 

...S WRITE CS(CIA), 

GATE ALU, 

READ NS, GATE NS 


Set up to find addr of LINK 
for this z. 

Scratch register b addr LINK. 
Now set to write MLINK into this 
LINK, 

And to set MLINK to address of 
this z info word. 

Change ALU to PASS LEFT this gate 
info word addr to MLINK. 

Set LINK to MLINK. Set MLINK to 
addr of this gate info word; 

And continue. 


6T 
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For each x-gate, the last z output test assumes: 

- LS register contains address of this gate info word 

- bit ,17 is set for the last z in list 


« 

• 

• • • • 

BRANCH (XPROC) 

ALT BRANCH 
KALC = PASS LEFT 
KT = SIGN 
KA = z 


• 


X... 

KA FAIL 

Test sign bit. 


4/6T 

.X.. 

LOAD NPC(SEQ) 

ALT. BRANCH to next 

word. 


. .s. 

READ NS, GATE NS(T) 

Branch to XPROC for 

sign = 0- 


= = =s 

READ NS, GATE NS 

Otherwise continue 

to N.+l. 

: 

• • • • 

BRANCH (ZPROC) 



2T 

s. . . 

READ NS, GATE NS 
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